Oops! The input is malformed!
Originally published 28 January 2009
My recent article Business Intelligence Data Analysis and Visualization: What’s in a Name? Part 1 created some good discussion and also created a certain amount of controversy. Most of the feedback I received about the article focused on my conclusions. Although the primary intent of the article was to summarize the views of other experts on the topic of data analysis and data visualization, some of my own views crept into the conclusions without adequate explanation. To remedy this, I decided this month to take a step back and look at the role of business intelligence in decision making, and then review how data analysis and data visualization fit into the overall business intelligence environment.
There are three main types of processes that business users employ in their jobs: operational processes to run the business, analytical processes to monitor business performance and provide users with the information they need to make informed decisions, and collaborative processes to support the sharing of information and business user interaction and collaboration. The role of a business intelligence (BI) system in this environment is to provide the applications, tools, techniques, policies and procedures for supporting analytical processes.
Figure 1 depicts the main tasks supported by a BI system. The discover task is used to find data that can aid business decision making and action taking. Examples of data discovery technologies include enterprise search and data profiling. The results from data discovery can be used to help build the business and data models that are used to access operational data.
Figure 1: Using Business Intelligence to Make Decisions
The access task retrieves operational data of interest and integrates it into a shared data store such as a data warehouse. In the case of operational business intelligence, the operational data may be passed directly to the analyze task without first being integrated into a shared data store.
Data processed by the analyze task is accessed from the shared data store and/or directly from operational systems. The output from the analyze task may consist of business performance analytics, scorecards, alerts (warnings and recommendations) or action messages. Scorecards compare the performance analytics with planned performance data from planning and budgeting systems. Alerts and action messages are created based on business rules defined by the business user.
The output from the analyze task is delivered to business users, business applications or stored in a shared data store for subsequent retrieval. A wide range of different hardware devices
and data formats exists for delivering results. Business users may, for example, employ desktop and mobile interactive interfaces to view and evaluate the analysis results. These interfaces may
also be used to retrieve result data sent by email or stored in a collaborative workgroup system or a data warehouse. This stored data, in turn, may be in the form of text files, spreadsheets,
office file formats, Adobe PDF files and so forth.
After evaluating the results from the analyze task, business users collaborate and decide if any action is required to resolve a business problem, or perhaps more closely align actual business performance with planned performance. The impact of any changes made to operational processing as a result of these actions can then be measured by the BI system in what is known as a closed-loop decision-making process.
In some cases, in order to make a decision, further analysis is required, and business users then iterate between the analyze and deliver tasks until they have the information they need to make a decision. This iterative processing is often done using interactive visual data analysis tools and applications. Some BI products are adding guided analysis features based on best practices to assist in this iterative processing.
To date, BI vendors have focused primarily on the access, integrate and analyze tasks. The assumption has been that business users know where the data is they need to analyze, understand its business meaning, and are quite capable of using the complex tools provided for analyzing data and delivering the results.
This assumption may be true for certain business and data analysts who spend their days on computer systems working with this data, but it is not the case for the majority of business users and information consumers. This is why BI systems are only used by a fraction of the users that could potentially benefit from their use. It is also why vendors are beginning to realize that they need to do a better job of supporting the discover and deliver tasks of a BI system if they are to increase the use of business intelligence in organizations.
In the area of data discovery, vendors are adding data profiling tools that enable IT users to evaluate and analyze the content of data sources for business value and data quality. These tools can be used to design data and business models, and data cleanup and transformation routines. For business users, vendors are adding search capabilities that allow them to easily search for and analyze data of interest in source systems.
For data delivery, there is increasing integration with office and collaborative products, the use of web-based rich Internet applications for improving interactivity and data presentation, and enhanced visual data analysis tools that support an iterative approach for the analyze and delivery tasks.
Now that we have a high level view of how a BI system operates, let’s revisit the terms data analysis and data visualization. Along the way, I also want to review where the term data integration fits into the terminology puzzle.
The concept of data analysis existed even before the advent of the computer; and, as a result, there are a wide variety of different definitions for data analysis. I like the one from Wikipedia:
“Data analysis is a process of gathering, modeling, and transforming data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.”
I think the key phrases in this definition are “highlighting useful information” and “supporting decision making.” In the case of a BI system, business users employ tools that support the analyze task to produce meaningful information that helps them make decisions. Similarly, IT users employ data profiling tools to analyze operational data for business value and data quality.
One issue that arose from Part 1 of this article concerns the term data integration and its relationship to data analysis. The problem with the term data integration is that it is a grouping term. A data integration platform, for example, typically includes data quality management and data profiling facilities. It could be argued that such a platform supports data analysis for IT users.
Let’s assume, however, that data integration means taking data from a source system and transforming it into a format that can be stored in a shared data store or passed to a tool that supports the analyze task.
The important word here is transform. In the case of data integration, transformation may involve changing the structure and/or content of the data. Data content changes can be wide ranging. Some transformation steps, for example, may involve aggregating the data, or doing a statistical analysis of the data. Data mining algorithms may also be used. This type of processing satisfies the Wikipedia definition for data analysis because the transformation step is deriving useful information from the source data.
In the conclusions of my article last month, I asked the question, “Are data transformation and integration different from data analysis?” The answer is that data integration involves data transformation and that some data transformations may involve data analysis.
Data visualization is a trickier word to define. Wikipedia provides this definition:
“Data visualization is the study of the visual representation of data, defined as information which has been abstracted in some schematic form, including attributes or variables for the units of information.”
The Wikipedia entry goes on to state, “The main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way.”
I think the key phrases here are “convey ideas effectively” and “providing insights.”
In Part 1 I asked, “Is data presented for presentation purposes only a form of data visualization?” The answer is that the delivery task presents the results from the analyze step to business users. In doing so, it may employ data visualization techniques and technologies to improve the understanding of the results. The business user may then iterate through the analyze and delivery tasks to get a better understanding of the data. Visual data analysis tools make it easier for business users to do this iterative processing.
I hope this clarifies my conclusions in last month’s article and adds to the understanding of the role of business intelligence in business decision making and action taking.
Recent articles by Colin White