Sunday, 20 September 2015

Big Data Broken Down

Big Data is any data that is difficult to store or process with available storage or computation power in real time. The era in which we are living, here data is getting generated at very high rate from different sources in different forms at an unbelievable speed.
In this blog we will talk about the features of Big Data. The 6 V’s of Big Data are:
Variability: Let us assume you go to restaurant and buy the same burger everyday but it tastes different every time. Variability refers to context, similar text may mean different when spoken in different contexts. This is still a challenge for algorithms to understand different meaning of same /similar text in different contexts.
Volume: Volume is the biggest challenge being posed to the traditional data storage & processing systems. The volume of data that we are dealing in today’s era is very high and getting multiplied in real time. Today data is being generated in TB even in PB.
Velocity: Data is being generated in Real Time, Logs and Sensors etc. Time sensitive data is being generated at high rate. All of these data needs to be worked upon real time as we enter in to the real time decision-making phase of the business. For example: is a particular credit card transaction safe to be processed or the transaction needs to be declined. With Big Data the banking industry is now able to understand the consumer pattern better and make safer bets on these transactions.
Big Data & Analytics DexLab Analytics
Veracity: Data, be it good, bad, incomplete or undefined is useful for analysis. Every data carries some information of value. Vast data samples to test / analyse various data hypotheses. Here we get data for analysis of whole population.
Volatility: In the world of Big Data, one needs to know how long data is valid and needs to be stored. Here you need to understand when data becomes invalid for current analysis. Like the banking institute might feel that a particular type of data generated at a particular time need not necessarily reflect on the credibility of the credit card holder. It becomes extremely important for them to understand not to loose business while avoiding bad business.
Variety: Variety refers to different types of data sources that are generating different types of data be it structured or unstructured data. Data comes in different forms be it Images, Videos, Logs, XML files. Unstructured data is difficult to store and analyse in traditional systems.
All the major organisations across the world are now exploring to store, manage and process their Big Data in a more economical and feasible platform for a better analysis and decision-making. Apache’s Big Data Hadoop is currently the market leader enabling these organizations a smoother transition. However, there still remains a major challenge of getting the rightly skilled professionals who can develop applications over Big Data Hadoop or build the new data architecture. The distributed computing across number of machines in a cluster is what givesBig Data Hadoop an edge over the traditional data base management systems.

No comments:

Post a Comment