by Paul Rudo on 18/10/12 at 4:21 pm
Big Data has the power to provide tremendous value. It allows companies to make better decisions, based on real-time objective numbers, and it also allows organizations to provide a more personalized level of service for customers. But there are a number of technological challenges which must be overcome when launching a new Big Data initiative.
Large Volumes of Data
Obviously, Bid Data means handling lots of information. The size of information storage used in most Big Data projects can provide a serious challenge for developers and architects, as they must figure out how to take in, process, manage and analyze these massive and fast-growing data volumes.
Rather than plan for today’s capacity needs, developers must focus on the meeting the data processing needs of the future, which are growing exponentially thanks to data from mobile devices, online services, GPS and satellite information, third-party data streams, Internet-enabled sensors and other sources.
Speed of Data Consumption
Big Data systems must be robustly designed to take in data streams from many different sources, and then turn this information around quickly in order to provide timely actionable insight. The power of Big Data lies in its real-time characteristic, since more recent information is the most actionable and has the greatest potential to provide business value.
Improvements in worldwide internet connectivity and the processing power of data-producing devices will only accelerate this pace at which data must be ingested, understood and turned into useful information.
In the old database paradigm, humans had the luxury of creating constraints within which data could be entered while also enforcing a structure which this information must adhere to. But what sets Big Data technologies – such as MapReduce – apart is their ability to work efficiently with unstructured data.
The challenge comes from the fact that fast-growing data volumes are difficult or impossible to manually structure due to their size, rapid growth, and constant changes. Establishing context amongst all of these disparate data structures requires that it be combined in a coherent way, while also allowing for easy integration of newly available data sources.
The complexity also comes from the fact that Big Data applications are increasingly relying on larger proportions of unstructured data (often 80% or more) which must then be interpreted, correlated and analyzed before it can be used.
When using exponentially growing unstructured data pools, privacy and veracity become very real concerns. It’s especially challenging when information is used to extract erroneous information about customers which could lead to degraded service and potential legal liabilities. Additionally, it’s important that information retained, shared, generated or extracted from these systems does not violate the client’s expectation to privacy or reveal any other inappropriate information.