Whether its fine-tuning supply chains, monitoring shop floor operations, gauging consumer sentiment, or any number of other large-scale analytic challenges, big data is having a tremendous impact on the enterprise. The amount of business data that is generated has risen steadily every year and more and more types of information are being stored in digital formats.
One of the challenges entails learning how to deal with all of these new data types and determining which information can potentially provide value to your business. It is not just access to new data sources, selected events or transactions or blog posts, but the patterns and inter-relationships among these elements that are of interest. Collecting lots of diverse types of data very quickly does not create value. You need analytics to uncover insights that will help your business. That’s what this paper is about.
Big data doesn’t only bring new data types and storage mechanisms, but new types of analysis as well. In the following pages we discuss the various ways to analyze big data to find patterns and relationships, make informed predictions, deliver actionable intelligence, and gain business insight from this steady influx of information.
Big data analysis is a continuum, not an isolated set of activities. Thus you need a cohesive set of solutions for big data analysis, from acquiring the data and discovering new insights to making repeatable decisions and scaling the associated information systems for ongoing analysis. Many organizations accomplish these tasks by coordinating the use of both commercial and open source components. Having an integrated architecture for big data analysis makes it easier to perform various types of activities and to move data among these components.
The Dawn of Big Data
Data becomes big data when its volume, velocity, or variety exceeds the abilities of your IT systems to ingest, store, analyze, and process it. Many organizations have the equipment and expertise to handle large quantities of structured data—but with the increasing volume and faster flows of data, they lack the ability to “mine” it and derive actionable intelligence in a timely way. Not only is the volume of this data growing too fast for traditional analytics, but the speed with which it arrives and the variety of data types necessitates new types of data processing and analytic solutions. However, big data doesn’t always fit into neat tables of columns and rows. There are many new data types, both structured and unstructured, that can be processed to yield insight into a business or condition. For example, data from twitter feeds, call detail reports, network data, video cameras, and equipment sensors often isn’t stored in a data warehouse until you have pre-processed it to distill and summarize and perhaps to detect basic trends and associations. It is more cost effective to load the results into a warehouse for additional analysis. The idea is to “reduce” the data to the point that it can be put in a structured form. Then it can be meaningfully compared to the rest of your data, and scrutinized with traditional business intelligence (BI) tools.
Big Data Analytics
Merging Traditional and Big Data Analysis Taking advantage of big data often involves a progression of cultural and technical changes throughout your business, from exploring new business opportunities to expanding your sphere of inquiry to exploiting new insights as you merge traditional and big data analytics. The journey often begins with traditional enterprise data and tools, which yield insights about everything from sales forecasts to inventory levels. The data typically resides in a data warehouse and is analyzed with SQL-based business intelligence (BI) tools. Much of the data in the warehouse comes from business transactions originally captured in an OLTP database. While reports and dashboards account for the majority of BI use, more and more organizations are performing “what-if” analysis on multi-dimensional databases, especially within the context of financial planning and forecasting. These planning and forecasting applications can benefit from big data but organizations need advanced analytics to make this goal a reality. For more advanced data analysis such as statistical analysis, data mining, predictive analytics, and text mining, companies have traditionally moved the data to dedicated servers for analysis. Exporting the data out of the data warehouse, creating copies of it in external analytical servers, and deriving insights and predictions is time consuming. It also requires duplicate data storage environments and specialized data analysis skills. Once you’ve successfully built a predictive model, using that model with production data involves either complex rewriting of the model or the additional movement of large volumes of data from a data warehouse to an external data analysis server. At that point the data is “scored” and then the results are moved back to the data warehouse. This cycle of moving and re-purposing data to create actionable information can take days, weeks or even moths to complete. While many organizations have achieved proficiency in exploiting their data through data analysis, they are still at the early stages of creating an analytic model that can deliver real business value from big data. The main obstacles are these slow and arcane processes for enabling direct and timely access to corporate data. However, new technologies are collapsing the old walls between IT and data analysts by enabling advanced analytics within the database itself, alleviating the need to move large volumes of data around. At the same time, new types of data are supplementing traditional data sources and familiar BI activities. For example, weblog files track the movement of visitors to a website, revealing who clicked where and when. This data can reveal how people interact with your site. Social media helps you understanding what people are thinking or how they feel about something. It can be derived from web pages, social media sites, tweets, blog entries, email exchanges, search indexes, click streams, equipment sensors, and all types of multimedia files including audio, video, and photographic. This data can be collected not only from computers, but also from billions of mobile phones, tens of billions of social media posts, and an ever-expanding array of networked sensors from cars, utility meters, shipping containers, shop floor equipment, point of sale terminals and many other sources. Most of this data is less dense and more information poor, and doesn’t fit immediately into your data warehouse. As we will see, some of it is better placed in Hadoop Distributed File System (HDFS) or in non-relational databases, commonly called NoSQL databases. In many cases, this is the starting point for big data analysis.