Big Data and Hadoop

What is Big Data?

Big data is classified as data which becomes challenging (for traditional data processing platforms such as relational database systems) to collect, manage and process the data in desired efficiency.
to solve many Big Data related problems we need to deploy Big data Solution.
But keep in mind, The design, implementation, and deployment of a Big Data platform require a clear definition of the Big Data problem by system architects and administrators
There are many Big Data platform availiable in Market below you find some of them.
1.Apache Spark
2.Apache Storm
3.Google BigQuery
4.Ceph
5.DataTorrent RTS
6.Disco
7.Pachyderm
8.Presto
9.Hydra
10.Misco
11.Qizmt
12.MongoDB
etc...etc....

what is Hadoop?

A Hadoop is nothing but just a data storage and processing engine.
since different Big Data problems have different properties. A Hadoop-based Big Data platform is capable of dealing with most of the Big Data problems, but might not be good fit for others. Because of these and many other reasons, we need to choose from Hadoop alternatives.
The alternative  could be any of the Big Data platforms listed above.

Do a DBA Need to learn Hadoop? if yes, do he need to learn Java to get into Big Data?


Yes A DBA should learn Hadoop or Big Data technologies its the growing demand of the buisness. Everyone knows DBA stands for “Default Blame Acceptor”. Since the database is always blamed, DBAs typically have great troubleshooting skills, processes, and instincts. All of these are critical for good cluster admins.
and therefore a DBA or System admin could become a really good cluster admin.
due to growing demand and deployment of Hadoop there are Hadoop Cluster Administrator's job in the Market and a DBA or Sysadmin can really encash this opportunity if learned hadoop and for such jobs there no hard and fast requirement of learning JAVA


What is Data Properties?

Ideally, data has the following three important properties: volume, velocity, and variety. There is another property we can include is value.
I will discuss here each of them to define the Big Data Problem

Defining a Big Data problem.

To define the big data problems you need to consider the following steps:

 Estimate the volume of data.

  There are two types of data in the real world: static ( national census data )and nonstatic data ( social network streaming data). While estimating the volume of data you must also consider the future volume of data

 Estimate the velocity of data

   The velocity estimate should include how much  data can be generated within a certain amount of time, for example during a day.  For static data, the velocity is zero. This property will not only affect the volume of data, but also determines how  fast a data processing system should handle the data.

Identify the data variety

  the data variety means the different sources of data, such as web click data, social network data, data in relational databases, and so on
  Each veriry of data requires specifically designed modules  to integerate it into the big data.
  For example, a web crawler is needed for getting data from the Web, and  a data translation module is needed to transfer data from relational databases to  a nonrelational Big Data platform

 Define the expected value of data

  The value property of Big Data defines what we can potentially derive from and how we can use Big Data.



1 comment: