5 Steps to Sifting Through Big Data

by on 16/08/12 at 9:15 am

Big data is the latest buzz in the data analytics and storage circles and involves the process of storing, analyzing and gaining actionable business intelligence from data sets consisting of millions or billions of records. There are a few general guidelines one must follow in order to fully leverage the business insight that big data analysis can bring to an organization.

Bring the Right Tools

Big data involves the use of big tools. Spreadsheet applications are great for summarizing and viewing trends on smaller data sets, but big data requires the use of enterprise-ready applications that can extrapolate business data, identify trends and correlate disparate data sources. Such enterprise applications are capable of taking the quantitative data stored in a company’s enterprise information systems and make that data qualitative, revealing previously unknown trends and correlations over time.

Understand the Relationships

It is easy to miss both correlation and causation in a big data environment. The countless rows of data can be overwhelming without a basic understanding of the business and how data elements are related to one another. Data architects working in a big data environment should be intimately familiar with the business and maintain communication with project managers within the business to validate that business data relationships are fully understood when developing a big data analytics solution.

Ask the Right Questions

It is imperative that there be a clear question to be answered before data analysis can begin in a big data environment. Careful collaboration must take place between the business operations and the analytics team to carefully craft specific questions to be answered as a result of the analysis process before determining the best way to query and analyze that information. Creating narrowly focused question(s) will limit the resulting data set to a more manageable size and be much easier to analyze on the micro scale. Broader questions may be broken down into easier to answer smaller business questions that will ultimately provide answers to the broader business questions.

Identify Trends

Looking for trends within large data sets is sometimes a challenging undertaking. As stated previously, a proper data analytics package will make this portion of the task much easier. Data analysts should carefully identify all data—slicing it by various measures, metrics and attributes—to identify trends.

Look for Causal Factors

After identifying trends, the next step in slicing a big data set is to look for the causal factors. Data analysts will find that more than one trend will often correlate directly with each other, indicating that one trend may be a causative factor of the other. This type of analysis ultimately leads to actionable business data that can be utilized immediately to improve or streamline business operations.

Author bio: Caitlin Laura is a senior developer with Experian QAS. She enjoys reading and writing about technology. She also enjoys travelling with her husband and two kids.

Leave a Reply