Big Data

Big Data: Correlations, Not Cause-and-Effect

Posted in Trends & Technologies and tagged , , , , , , , , , , , , , , , .

Image by Marcos Gasparutti, CC license

In their recently published book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,” Viktor Mayer-Schonberger and Kenneth Cukier say that big data will provide a lot of information that can be used to establish correlations, not necessarily precise cause and effect.

But that might be good enough to extract the value you need from big data.

Three examples from their book:

  1. Walmart discovered a sales spike in Pop-Tarts if storms were in the forecast. The correlation was also true of flashlights, but selling more flashlights made sense; selling more Pop-Tarts didn’t.
  2. Doctors in Canada now prevent fevers in premature infants because of a link between a period when the baby’s vital signs are unusually stable, and, 24 hours later, a severe fever.
  3. Credit scores can be used to predict which people need to be reminded to take a prescription medicine.

Why did the people involved in the above examples compare such different sets of data? One possible reason: because they could – relatively quickly and at low cost – this was made possible by superfast data processing and cheap memory. If you could mash together all kinds of data in large volumes – and do so relatively cheaply – why wouldn’t you until you found some correlations that looked interesting?

You can begin experimenting – a process I endorse — with Big Data. You need three basic components:

  1. A way to get the data, whether out of your transaction systems or from external sources, and into a database.
  2. Superfast data processing (a database with enormous amounts of RAM and massively parallel processing). This can be had on a software-as-service basis from Amazon and other vendors.
  3. Analytics tools that present the data in the visual form you want. Vendors include Oracle, Teradata, Tableau, Information Builders, Qlikview, Hyperion, and many others.

Correlations are usually easier to spot visually. And visualization is where the market seems to be going, at least in terms of hype and vendor offerings. New insights are always welcome, so we shall see what sells and what doesn’t.

The assessment from Gartner seems about right to me at this point in time: that big data is both 1) currently in the phase they call the “trough of disillusionment;” and 2) promising enough that its use in BI will grow sharply.

SharePin on PinterestShare on LinkedInEmail this to someoneShare on FacebookShare on Google+Tweet about this on Twitter