When companies first began building data warehouses, life was relatively simple. Developers used data integration tools to extract and transform structured data from traditional line-of-business (LOB) operational applications and load it into a data warehouse for processing by business intelligence (BI) reporting and analysis tools and applications.
Today, things are quite different. The creation of analytics is no longer solely managed by the BI and data warehousing group. Analytics are now deployed by a variety of different organizations within a company. Examples include content analytics (content and document management group), event analytics (applications group) and web analytics (web development group). The growing use of collaboration and social computing tools will inevitably lead to the development of collaborative analytics.
Many of the people building these analytical solutions have little knowledge of business intelligence and data warehousing, and often have little desire, or sometimes even the need, to gain such knowledge. It is naïve therefore for the BI group to assume that this morass of information can be brought under their control and fully integrated into a data warehousing environment. Web analytics, the subject of this article, is an excellent example of this issue because the web, applications development and/or the BI groups may develop these analytics.
Why Web Analytics?
The Web Analytics Association
(WAA) defines web analytics as, “The measurement, collection, analysis and reporting of Internet data for the purpose of understanding and optimizing Web usage.” One role of the WAA is to define standard metrics that products should ideally support. Examples of these metrics include page views, visits, unique visitors, new visitors, returning visitors, clickthroughs and conversions (number of times a desired outcome was achieved).
The WAA standard metrics illustrate one of the main uses of web analytics, which is to identify visitors and how often they visit a website, characterize their visits, and measure how often the visit is successful (which usually means they purchased a product or service).
There are a few things to notice about the WAA metrics. First is that they report after the fact summaries of what has happened over a previous period of time. They are really intended for tactical and strategic decision making. A different class of tool is required if you want to do more operational decision making for fraud detection, close to real time marketing campaigns, etc.
Another thing to note about the WAA metrics is the concept of a visitor. How is a visitor identified? If the visitor purchases a product or service, then clearly more information is available – for example, name, address, phone number and credit card number. In this case, relating the visitor to interactions through other channels, such as a retail store or service center, may be useful. When these types of customer relationships need to be identified, yet another class of web analytics application may be required.
If a visitor does not purchase a product, how does a web analytics tool identify a returning customer? One method is by creating a cookie on the visitor’s computer. This only works, of course, if the visitor has cookies enabled and has not deleted the cookie since the last visit. This brings us to the next topic, which is how products gather raw web data for analysis.
How Do Web Analytics Products Work?
A good example of a product that uses page tagging is Google Analytics
, which is a free software-as-a-service (SaaS) offering by Google that generates detailed visitor metrics for a website. The offering is intended for marketers, rather than webmasters. It is especially useful for measuring the effectiveness of marketing campaigns that employ Google’s AdWords marketing feature. Websites with less than 5 million page views per month can use the service even if they don’t have an AdWords account.
Tracking Code (GATC) that is added onto every page of a website that is to be tracked. The code collects visitor data and sends the data back to Google data collection servers. The servers process the data at regular intervals and create reports for on-demand access by the website owner. Google also provides the Urchin Software
fee-based product for use in house.
SaaS and in-house offerings that compete with Google Analytics include Coremetrics
(recently acquired by Adobe) SiteCatalyst, Unica
Analytics and Yahoo
Web Analytics. CMS Watch
has an excellent report for purchase that compares these and other web analytics products. The CMS Watch website also has a free report appendix that documents how these products support WAA metrics.
One important factor to consider when purchasing a web analytics product is its ability to handle web pages containing dynamic content involving Rich Internet Applications (RIA) built using technologies such as Ajax and Adobe Flash. The ability to track RSS syndication readership and mobile users may also be important for certain organizations. Some other vendors, SeeWhy
, for example, provide very specific applications for web marketing.
All of the products above support page tagging. A few also support the processing of log files. The table below compares the tagging and log file approaches.
Web log files are also an ideal data source for a data warehousing environment. This latter approach enables web data to be correlated with other types of enterprise data. Given the data volumes involved, a certain level of filtering and consolidation may be required before the log data can be loaded into a data warehouse. This task can be done using standard data integration tools that support flat files or using technologies such as Hadoop MapReduce.
Real-Time Web Analytics
When real-time or near-real-time web analytics are required, two other approaches are available from vendors. The first is business activity monitoring (BAM) where the business transactions generated by web interactions are tracked and analyzed as they flow through operational systems. Applications server vendors such as IBM
(WebSphere Business Monitor) and Oracle
(Oracle BAM) support these types of analytics.
BAM is useful for analyzing a stream of business transactions and providing real-time reports and dashboards. When more complex processing of transaction and event streams is required, products that support complex event processing (CEP) can be used. CEP solutions can analyze and correlate multiple streams of current data (and also possibly historical data) looking for patterns and trends, and predicting possible outcomes. Examples of vendors here include Aleri
, IBM (WebSphere Business Events
, InfoSphere Streams
(Oracle CEP), Tibco
(BusinessEvents) and Truviso
. Note that some vendors prefer the term business event processing (BEP) to CEP, while other use terms such as continuous intelligence and continuous analytics.
In the high growth and volatile business environment of the Internet, organizations need to be able to monitor and analyze business performance in the same way they do in the more traditional business environment. Without good performance data, it is difficult to optimize business operations on the web and thus remain competitive. There are many different approaches and tools for producing web analytics. These range from the SaaS tools such as Google Analytics to BAM, CEP and data warehousing and BI offerings provided by enterprise software vendors. The SaaS offerings provide targeted solutions for web marketing, whereas the enterprise solutions enable web data to be merged with other corporate data to provide an enterprise perspective on web processing.
SOURCE: The Extremes of Web Analytics: From Google to BAM and Business Intelligence
Recent articles by Colin White