So this post is wayyyy past due, but I decided to still do it anyway. In the first part of ‘Let’s talk data’, I attempted to bring some clarity to the Big Data Hype by explaining what Big Data is, and why it is important. Here, I want to share some guidelines for leveraging Big Data to derive competitive advantage or some other form of business value.
Like most ‘new’ concepts, Big Data strategies aren’t exactly new. Rather, they build upon the existing pillars of data and information management, shown below:
1. Data Acquisition: Since you cannot leverage data that you don’t have, an important part of crafting a big data strategy is data acquisition. What separates big data acquisition strategies from those of traditional information management is that here, virtually ALL your data has to be acquired. The reason for this is that although most of this information may not be very useful on their own, they could be grouped together and converted into knowledge. In crafting a big data acquisition strategy, particular attention has to be paid to the issue of scale, since the chosen hardware and software must be able to effectively handle the processing of this high-volume, low-density data.
2. Data Organization: In transaction processing (OLTP) systems, the key consideration for data organization is to facilitate the rapid insertion and updating of records in the database. This is usually achieved with the use of entity-relationship (ER) models. In crafting a big data organization strategy, ER diagrams are quite irrelevant as the key objective is to transform the data in a manner that will allow all or most of it to be analyzed at once. This is no small feat, and is mostly achieved by the use of massively parallel hardware.
3. Data Analysis: As stated above, analysis of big data requires access to most of the data at the same time. As user friendly big data analysis tools are not yet readily available (and SQL queries won’t help), data access is usually done via programming in languages like R. Also, since the relationships between these data elements are not quite obvious, statistical knowledge is usually needed to sift through the data and identify patterns and trends. The combination of these big data analysis requirements has led to the birth of a new generation of professionals, possessing a blend of development skills and statistical knowledge, called Data Scientists. If you think they’re scarce and expensive, you’re right. 🙂
4. Decision Making: Proper analysis of big data is likely to uncover actionable insights from which organizations could derive competitive advantage or some other form of business value. These insights are usually exposed to executives via suitable dashboards. Like in traditional information management, the key consideration here is that the insights should be quickly and easily accessible by executives.
These days, it is virtually impossible to talk about big data without mentioning Hadoop, an open source framework that supports data-intensive distributed applications (read: big data), and was derived from Google’s MapReduce programming model. A goldmine of Hadoop information can be found here.
Image by Teradata