Image: “Data Center.” by Stan Wlechers, CC license
So what is Big Data, particularly Big Data analytics? Why all the hype?
Big Data is what it implies: tons of data. We’re talking millions or billions of rows here – way too much for standard query tools accessing data on a disk.
What would constitute “tons” of data? Every bottle of “spring,” “purified” or “mineral” water that was scanned at a grocery store checkout during the month of July 2011; the brand, the price, the size, the name and location of the store, and the day of the week it was bought. That’s six pieces of data, multiplied by the estimated 3.3 billion bottles of water sold monthly in the United States.
Big Data analytics is the process of extracting meaning from all that data.
The analysis of big data is made possible by two developments:
1) The continuation of Moore’s law; that is, computer speed and memory have multiplied exponentially. This has enabled the processing of huge amounts of data without retrieving that data from disk storage; and
2) “Distributed” computing structures such as Hadoop have made it possible for the processing of large amounts of data to be done on multiple servers at once.
The hype you read about Big Data may be justified. Big data does have potential and should not be ignored. With the right software, a virtual picture of the data can be painted with more detail than ever before. Think of it as a photograph, illustration or sketch – with every additional line of clarification or sharpening of detail, the picture comes more into focus.
Michael Malone, writing in The Wall Street Journal, says that some really big things might be possible with big data:
“It could mean capturing every step in the path of every shopper in a store over the course of a year, or monitoring every vital sign of a patient every second for the course of his illness….Big data offers measuring precision in science, business, medicine and almost every other sector never before possible.”
But should your enterprise pursue Big Data analytics? It may already have. If your company processes millions of transactions or has millions of customers, you have a lot of data to begin with.
You need three things to enable Big Data analytics:
- A way to get the data, whether out of your transaction systems or from external sources, and into a database. Typically this is done with ETL or Extract, Transform, and Load software tools such as Informatica. Jobs are set up and the data is pulled every hour, day, etc., put into a file and either pushed or pulled into a storage environment.
- Superfast data processing. Today, an in-memory database (a database with enormous amounts of RAM and massively parallel processing) can be acquired and used on a software-as-service basis from Amazon Web Services at a very reasonable cost.
- User interface analytics tools that present the data in the visual form you prefer. Vendors include Oracle, Teradata, Tableau, Information Builders, Qlikview, Hyperion, and many others. The market here is moving toward data visualization via low-cost, software-as-a-service tools that allow you to aggregate disparate sources of data (internal and external systems, social media, and public sources like weather and demographic statistics.