Blog: Colin White I like the various blogs associated with my many hobbies and even those to do with work. I find them very useful and I was excited when the Business Intelligence Network invited me to write my very own blog. At last I now have somewhere to park all the various tidbits that I know are useful, but I am not sure what to do with. I am interested in a wide range of information technologies and so you might find my thoughts will bounce around a bit. I hope these thoughts will provoke some interesting discussions. Copyright 2009 Tue, 06 Oct 2009 13:37:35 -0700 Report from the Big Data Summit and Hadoop World

Last week I presented at the Big Data Summit and attended Hadoop World in New York. Both events focused on the use of Hadoop and MapReduce for the processing and analyzing of very large amounts of data.

The Big Data Summit was organized by Aster Data and sponsored by Informatica and Microstrategy. Given that the summit was in the same hotel as that used for Hadoop World the following day, it would be reasonable to expect that most of the attendees would be attending both events. This was not entirely the case. Many of the summit attendees came from enterprise IT backgrounds and these folks were clearly interested in the role of Hadoop in enterprise systems. Whereas many of them were knowledgeable about Hadoop, an equal number were not.

The message coming out of the event was that Hadoop is a powerful tool for the batch processing of huge quantities of data, but coexistence with existing enterprise systems is fundamental to success. This is why Aster Data decided to use the event to launch their Hadoop Data Connector, which uses Aster's SQL-MapReduce (SQL-MR) capabilities to support the bi-directional exchange of data between Aster's analytical database system and the Hadoop Distributed File System (HDFS). One important use of Hadoop is to preprocess, filter, and transform vast quantities of semi-structured and unstructured data for loading into a data warehouse. This can be thought of as Hadoop ETL. Good load performance in this environment is critical.

Case studies from Comscore and LinkedIn demonstrated the power MapReduce in processing pedabytes of data. In the case of Comscore they are aiming to manage and analyze 3 months of detailed records (160 billion records) using Aster SQL/MR. LinkedIn, on the other hand is using a combination of Hadoop and Aster's MapReduce capabilities and moving data between the two environments. Performance and parallel processing is important for efficiently managing this exchange of data. This latter message was repeated by several other case studies at both events.

Hadoop World had a much more open source and developer feel to it. It was organized by Cloudera and had about 500 attendees. About half the audience was using Amazon Web Services and clearly experienced in Hadoop. Sponsors included Amazon Web Services, IBM, Facebook and Yahoo, all of whom gave keynotes. These keynotes were great for big numbers. Yahoo, for example, has 25,000 nodes running Hadoop (the biggest cluster has 4,000 nodes). Floor space and power consumption become major issues when deploying this level of commodity hardware. Yahoo processes 490 terabytes of data to construct its web index. This index takes 73 hours to build and has experienced a 50% growth in a year. This highlights the issues facing many web-based companies today, and potentially other organizations in the future.  

Although the event was clearly designed to evangelize the benefits of Hadoop, all of the keynotes emphasized interoperability with, rather than replacement of, existing systems. Two relational DBMS connectors were presented at the event including Sqoop from Cloudera and support for the Cloudera DBInputFormat interface from Vertica. Cloudera also took the opportunity of announcing it was evolving from a Hadoop services company to being a developer of Hadoop software.

The track sessions were grassroots Hadoop-related presentations. There was a strong focus on improving the usability of Hadoop and adding database and SQL query features to the system. I felt on several occasions many people were trying to reinvent the wheel and trying to solve problems that had already been solved by both open source and commercial database products. There is a clear danger of trying to expand Hadoop and MapReduce from being an excellent system for the batch processing of vast quantities of information to being a more generalized DBMS.  

The only real attack on existing database systems came surprisingly from the J. P. Morgan financial services company. The presentation started off by denigrating current systems and presenting Hadoop as an open source solution that solved everyone's problems at a much lower cost. When it came to use cases, however, the speakers positioned Hadoop as suitable for processing large amounts of unstructured data with high data latency. They also listed a number of "must have" features for the use of Hadoop in traditional enterprise situations: improved SQL interfaces, enhanced security, support for a relational container, reduced data latency, better management and monitoring tools, and an easier to use developer programming model. Sounds like a relational DBMS to me. Somehow the rhetoric at the beginning of the session didn't match the more practical perspectives of the latter part of the presentation.

In summary, it is clear that Hadoop and MapReduce have an important role to play in data warehousing and analytical processing. They will not replace existing environments, but will interoperate with them when traditional systems are incapable of processing big data and when certain sectors of an organization use Hadoop to mine and explore the vast data mountain that exists both inside and outside of organizations. This makes the current trend toward hybrid RDBMS SQL and MR solutions from companies such as Aster Data, Greenplum and Vertica an interesting proposition. It is important to point out, however, that each of these vendors takes a different approach to providing this hybrid support and it is essential that potential users match the hybrid solution to application requirements and developer skills. It is also important to note that Hadoop is more than simply MapReduce.   

If you want to get up to speed on all things Hadoop, read some case studies, and gain an understanding of its pros and cons versus existing systems then get Tom White's (I am not related!) excellent new book "Hadoop: The Definitive Guide" published by O'Reilly.

]]> Tue, 06 Oct 2009 13:37:35 -0700
Analyst Blogging: Let's Have Some Etiquette! I commented in a my previous blog entry that the controversy over ParAccel's TPC-H benchmark has become quite heated. This is especially true on Curt Monash's blog where at one point he made some personal comments about Kim Stanick, ParAccel's VP of Marketing. See this link for details.

This is the second blog this month that I have read where an analyst makes an attack, not only on the vendor, but also one of its employees. The other blog (and an associated article) was by Stephen Few entitled "Business is Personal - Let's Stop Pretending It Isn't." See this link for details.

The good thing about social computing is that it provides a fast way of sharing and collaborating about industry developments. However, these technologies have the same problems as e-mail and instant messaging, they enable people to react immediately to something that upsets or annoys them. With blogging, unlike email and instant messaging, everyone gets to see the results!

As analysts our job is to write balanced reviews of industry developments that provide useful information to the reader. My concern is that some analysts are behaving as though they are on cable television or writing for the tabloids. I believe we can critique a product without attacking a company, its products or its employees. Personal attacks by analysts are unprofessional, even if the company fights back against a review they take exception to. What do you think?

]]> Thu, 25 Jun 2009 13:52:42 -0700
The ParAccel TPC-H Benchmark Controversy ParAccel, one of the new analytic DBMS vendors, recently announced some impressive TPC-H benchmark results. A good review of these results can be found on Merv Adrian's blog at this link.

Not everyone agreed with Merv's balanced review. Curt Monash commented that "The TPC-H benchmark is a blight upon the industry." See his blog entry at this link.

This blog entry resulted in some 41 (somewhat heated) responses. At one point Curt made some negative comments about ParAccel's VP of Marketing, Kim Stanick, which in turn led to accusations that his blog entry was influenced by personal feelings.

I have two comments to make about this controversy. The first concerns the TPC-H benchmark and the second is about an increasing lack of social networking etiquette by analysts.

TPC benchmarks have always been controversial. People often argue that that do not represent real life workloads. What this really means is that you mileage may vary. These benchmarks are expensive to run and vendors throw every piece of technology at the benchmark in order to get good results. Some vendors are rumored to have even added special features to their products to improve the results. The upside of the benchmarks is that they are audited and reasonably well documented.

The use of TPC benchmarks has slowed over recent years. This is not only because they are expensive to run, but also because they have less marketing impact than in the past. In general, they have been of more use to hardware vendors because they demonstrate hardware scalability and provide hardware price/performance numbers. Oracle was perhaps an exception here because they liked to run full-page advertisements saying they were the fastest database system in existence.

TPC benchmarks do have some value to both the vendor and the customer. The benefits to the vendor are are increased visibility and credibility. Merv Adrian described this as a "rite of passage." It helps the vendor get on the short list. For the customer these benchmarks show the solution to be credible and scalable. All products work well in PowerPoint, but the TPC benchmarks demonstrate that the solution is more than just vaporware.

I think most customers are knowledgeable enough to realize that the benchmark may not match their own workloads or scale as well in their own environments. This is where the proof of concept (POC) benchmark comes in. The POC enables the customer to evaluate the product using their own workloads.

TPC benchmarks are not perfect, but they do provide some helpful information in the decision making process.

I will address the issue of blog etiquette in a separate blog entry.  

]]> Thu, 25 Jun 2009 13:43:10 -0700
More Information on Data warehousing in the Cloud and SaaS BI My post a couple of days ago about data warehousing in the cloud led to requests for more information about this topic and related SaaS BI solutions.  

Claudia Imhoff and I recently published a research report on the BeyeNETWORK entitled "Pay as You Go: Software-as-a-Service Business Intelligence and Data Management." The report was sponsored by Blinklogic, Host Analytics, PivotLink and SAP BusinessObjects who offer SaaS BI solutions. It was also sponsored by Kognitio who (like Aster, GreenPlum and Vertica mentioned in my previous blog) have a data warehousing in the cloud offering. The report discusses SaaS BI and data warehousing and reviews the pros and cons of using this type of deployment model.

The report can be found on

]]> Thu, 11 Jun 2009 16:58:09 -0700
Data Warehousing in the Cloud Gains Momentum The use of cloud computing for data warehousing is getting a lot of attention from vendors. Following hot on the heels of Vertica's Analytic Database v3.0 for the Cloud announcement on June 1 was yesterday's Greenplum announcement of its Enterprise Data Cloud™ platform and today's announcement by Aster of .NET MapReduce support for its nCluster Cloud Edition.

I have interviewed all three vendors over the past week and while there are some common characteristics in the approaches being taken by the three vendors to cloud computing, there are also some differences.

Common characteristics include:
  • Software only analytic DBMS solutions running on commodity hardware
  • Massively parallel processing
  • Focus on elastic scaling, high availability through software, and easy administration
  • Acceptance of alternative database models such as MapReduce
  • Very large databases supporting near-real-time user-facing applications, scientific applications, and new types of business solution
The emphasis of Greenplum is on a platform that enables organizations to create and manage data warehouses and data marts using a common pool of physical, virtual or public cloud infrastructure resources. The concept here is that multiple data warehouses and data marts are a fact life and the best approach is to put these multiple data stores onto a common and flexible analytical processing platform that provides easy administration and fast deployment using good enough data. Greenplum sees this approach being used initially on private clouds, but the use of public clouds growing over time.

Aster's emphasis is on extending analytical processing to the large audience of Java, C++ and C# programmers who don't know SQL. They see these developers creating custom analytical MapReduce functions for use by BI developers and analysts who can use these functions in SQL statements without any programming involved.

Although MapReduce has typically been used by Java programmers, there is also a large audience of Microsoft .NET developers who potentially could use MapReduce. A recent report by Forrester, for example, shows 64% of organizations use Java and 43% use C#. The objective of Aster is to extend the use of MapReduce from web-centric organizations into large enterprises by improving its programming, availability and administration capabilities over and above open source MapReduce solutions such as HADOOP.

Vertica see its data warehouse cloud computing environment being used for proof of concept projects, spill over capacity for enterprise projects and for software-as-service (SaaS) applications. Like Greenplum it supports virtualization. Its Analytic Database v3.0 for the Cloud adds support for more cloud platforms including Amazon Machine Images and early support for the Sun Compute Cloud. It also adds several cloud-friendly administration features based on open source solutions such as Webmin and Ganglia.

It is important for organizations to understand where cloud computing and new approaches such as MapReduce fit into the enterprise data warehousing environment. Over the course of the next few months my monthly newsletter on the BeyeNETWORK will look at these topics in more detail and review the pros and cons of these new approaches.

]]> Tue, 09 Jun 2009 00:00:01 -0700
Aster Data Systems Announcement Demonstrates Interest in Data Warehousing in the Cloud Using MapReduce nCluster Cloud Edition. The Aster nCluster is an analytical database system for supporting very large relational databases. Aster joins companies such as Vertica and Kognitio in providing data warehousing capabilities on a cloud computing platform. Aster supports both the Amazon and AppNextus cloud computing environments.

This announcement demonstrates the growing demand for analytical processing from companies that have a significant presence in the web marketplace. One of the users of the CloudEdition, for example, is Didit, a web ad marketing company. These types of companies need to analyze huge amounts of web-related data.

One interesting aspect of Aster is its support for MapReduce, which is a growing trend by database companies Greenplum has MapReduce support and IBM is working on supporting it (System S research project). MapReduce provides a framework for massive parallel processing and is used by a number of web-centric companies such as Google. A key distinguishing feature for Aster is that it supports custom SQL functions that exploit MapReduce. MapReduce capabilities could become a key differentiator in high volume cloud computing data warehousing environments.  

You can find out more about the use of MapReduce in database systems in an article I published on the BeyeNetwork entitled "Are Relational Database Systems Keeping Up with the Information Processing Needs of Companies?"

Given my database background, it's exciting to see that database products are moving away from being commodity items are now offering some important distinguishing features.

]]> Tue, 10 Feb 2009 08:45:29 -0700
Microsoft Revamps its PerformancePoint BI Strategy
The BPM component evolved from the Microsoft Business Scorecard Manager, whereas the analytics component is based on a subset of the functionality acquired from ProClarity. The planning component was a new component developed by the Microsoft BI group aimed specifically at financial planning and budgeting.

Today's announcement breaks out the BPM and analytics components from PerformancePoint and merges them into the Enterprise Edition of Microsoft SharePoint Server. Existing SharePoint Enterprise Edition users (with Software Assurance) will now get these components as a part of their licensing agreement. These customers will be able to download the PerformancePoint components starting April 1.

In the summer, Microsoft will release Service Pack 3 of PerformancePoint 2007. This will be the final release of the product, which will be supported for ten years.

Microsoft's strategy is to move the responsibility for financial planning and budgeting to the Microsoft Dynamics FRX and Forecaster products. However, horizontal planning capabilities will continue be added to Microsoft SQL Server, Excel, and SharePoint over future releases.

This new direction makes sense for Microsoft. Although Microsoft was emphasizing the BPM and planning capabilities of PerformancePoint, it was achieving limited success in these areas. Instead, the majority of customers were buying the product for its analytics capabilities. This was especially true for ProClarity users.

Another reason why this makes sense is that Microsoft SharePoint is a very successful product, and this is leading to companies purchasing related Microsoft solutions. Over 80 percent of PerformancePoint customers, for example, are also SharePoint Server users. The penetration of SharePoint in the market is also a key factor in the success of SQL Server and its BI components.

Given that Microsoft Office is also increasingly being integrated with Microsoft SharePoint, it means that customers will now be motivated to purchase Microsoft Office, SQL Server, and SharePoint Server as a product set in order to deploy business intelligence and related collaborative tools to a mass business user audience.

]]> Fri, 23 Jan 2009 09:00:00 -0700
Using the R Programming Language for Data Analysis I came across this article in the New York Times about the R programming language. It was interesting to note that it was the number 2 most read article in the technology section.

The article suggests R is a threat to SAS. Any perspectives on this or the use of R for data analysis?

]]> Wed, 07 Jan 2009 08:06:46 -0700
BI Predictions for 2009: What Ever It Takes to Get the Job Done Well it’s the last day of 2008, and it’s tradition at this time of year to make predictions for the coming year. If the financial chaos of the last few months continues into 2009, which most financial pundits say it will, then the IT industry is heading for a tough time over the coming year. This makes predicting industry directions really difficult because IT organizations are less inclined to purchase new products and technologies when budgets are tight.

The business intelligence (BI) marketplace has often been immune to industry downturns. This is because companies often turn to BI in difficult times to help them identify areas where revenues can be increased and costs can be reduced. This is especially the case in front office sales, marketing, and support organizations. Given the potential size of the coming downturn, however, can even BI be immune? I doubt it.

I believe, however, that there are ways BI can ride out the coming storm and be of benefit to the business. I think the main task that organizations should focus on is using new BI technologies to reduce costs (rather than increasing revenues). This can be achieved by reducing the cost of delivering new BI business solutions and by increasing business user productivity.

The BI solutions that will have the most impact in 2009 will be those that provide IT and business users quick and low-cost approaches for discovering, accessing, integrating, analyzing, delivering and sharing information in a way that that helps business users become more productive and more self-sufficient.

This means that there will be increased interest in open source software, BI software-as-a-service, low-cost application appliances, search, the integration of BI with collaborative and social computing software, rich internet applications, web syndication, and data and presentation mashups. Many of these solutions will come from small innovative BI companies, rather than large software companies who are still struggling to integrate the morass of BI software they acquired in 2008.

The technologies mentioned support low cost and fast BI application deployment. Many of them will be used by line-of-business IT rather than the enterprise IT organization. This could result in a turf war where enterprise IT tries to control and govern the use of these new technologies by the business. This would be a huge mistake. Instead enterprise IT should look for best practices in the use of these technologies by business groups, replicate them in other parts of the organization, and look for ways of incorporating the cream of the crop into the existing IT environment.

The purists will cry that this will lead to anarchy and islands of data and software. If that is the case then so be it. In the coming 12 months we need to do what ever it takes to be productive and reduce short-term costs. This is not the time for fancy architectures, purist approaches, academic debates, or large projects.

Have a great 2009!

]]> Wed, 31 Dec 2008 16:14:00 -0700
Will MapReduce Start a New Relational Database War? Relational database systems, such as IBM DB2 and Oracle Database, have undergone over a quarter century of development. During that time they have managed to successfully fight off competing database technologies for supporting mainstream database management. Do you remember the object/relational wars of the eighties?

MapReduce, a software framework introduced by Google for supporting parallel processing over large petabyte files has garnered significant attention of late. IBM is experimenting with this in conjunction with Google, and GreenPlum recently announced support.

The significant interest in MapReduce, and related technologies such as Hadoop and HDFS, has led to a backlash from the relational camp. David DeWitt and Michael Stonebraker have been especially outspoken (see and

Here is a small quote from their thoughts on the topic:

"As both educators and researchers, we are amazed at the hype that the MapReduce proponents have spread about how it represents a paradigm shift in the development of scalable, data-intensive applications. MapReduce may be a good idea for writing certain types of general-purpose computations, but to the database community, it is:

1. A giant step backward in the programming paradigm for large-scale data intensive applications

2. A sub-optimal implementation, in that it uses brute force instead of indexing

3. Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago

4. Missing most of the features that are routinely included in current DBMS

5. Incompatible with all of the tools DBMS users have come to depend on"

Does this mean the database wars are starting up again?

My opinion is that MapReduce is not intended for general purpose commercial database processing and is therefore not a major threat to relational systems. However, it does have its uses (as Google has demonstrated) for certain types of high volume processing. It also demonstrates that as data volumes get bigger, and the complexity of data and data structures increases, other types of database technology may start to gain traction in certain niche marketplaces. The use by IBM of the SPADE language, instead of StreamSQL, in its InfoSphere Streams product (System S) also demonstrates the changes going on in the database market.

What do you think?

]]> Tue, 25 Nov 2008 16:20:38 -0700
Mobile BI: Do the Apple iPhone and Google G1 Have a Role to Play? The Apple iPhone has had a dramatic impact on the mobile phone industry, and the new Google G1 is also aiming to take a slice of this market. So far, however, these devices are targeted primarily at personal users, but the question is, “What role, if any, will these gadgets have in the business environment for applications such as mobile BI?

At the recent SAP TechEd, I had the opportunity to address this topic with John Schwartz, CEO of Business Objects. I made the point that mobile computing was enjoying more success in Europe than the US, and asked him if he thought the Apple iPhone would change this. He agreed that mobile computing use was higher in Europe, but felt the majority of this was still for personal purposes. He noted that mobile BI was seeing growth in both the US and Europe. However, he said the device of choice for mobile BI was still the Blackberry, because its architecture was more suited to this type of processing. So far, he said, Business Objects has seen little demand for iPhone support.

Donald Cambell, CTO of Cognos, took the same position when I asked him this question at a recent IBM Cognos analyst meeting. He said IT is still primarily supplying Backberry’s for business use, but in some cases IT will support personally purchased iPhones. He said the iPhone still doesn’t have the promised capability to run processes in the background, which limits its use for BI. He noted that the new Google G1 has an excellent development platform, and if the device is successful it could have a major impact on mobile business applications. He also made the interesting point that Windows Mobile was the preferred mobile platform for packaged embedded solutions used in locations such as hospitals, for example.

I would be interested to hear from other people about how their organization is deploying mobile BI solutions.

]]> Tue, 30 Sep 2008 17:41:28 -0700
Major Changes Going on in Microsoft BI The last few weeks have seen some major changes in Microsoft BI. There was the Datallegro acquisition, the announcement that Bill Baker was leaving, and the release of SQL Server 2008. For me, the first two changes are likely to have the most impact. SQL Server 2008 is well covered on the Microsoft Web site and so I won't address this topic here.

The acquisition of Datallegro created a significant amount of interest, and over the last two weeks I have discussed the acquisition with a wide range of people. Although opinions vary, some general consensus has emerged from those discussions.

The first question is why did Microsoft acquire Datallegro? The answer is that Microsoft wants to market SQL Server as a solution for large-scale data warehousing, and to do this they need to compete with the main DBMS vendors who have massively parallel database products. The Microsoft research group in San Francisco was spearheading key efforts to expand SQL Server in this area, but with the tragic death of Jim Gray it would appear the group lost momentum. To regain this momentum, Microsoft felt it needed to purchase an MPP database appliance.

Why Datallegro? There are a wide range of database appliances on the market, but some of them use proprietary hardware and software techniques, and many of them are tightly integrated into the underlying system. What Microsoft needed was a conventional relational DBMS appliance where they could quickly replace Linux by Windows, and the open source relational DBMS by SQL Server. A smaller number of appliances fit this requirement. Datallegro was one of them, and as we saw the ultimate winner.

What has Microsoft bought? This is a more controversial question. The first thing to note is the purchase price. The rumor mill reports the number to be $275 million. From my perspective this is a staggering sum for a company with a limited track record in data warehousing and few customers. Many people I spoke to were appalled by the sum paid by Microsoft. The feeling is that Microsoft purchased a marketing position rather than any real technology. One shouldn't of course underestimate the power of this marketing position. Whereas I don't think IBM or Teradata will feel threatened by the acquisition, it does put Oracle in a difficult position.

Oracle has spent years trying to sell its shared everything solution as a competitor to shared nothing MPP approaches for data warehousing. The result is that it is losing market share to competitors such as IBM, Teradata and appliance vendors in the high-end data warehousing sector. From a marketing perspective, Microsoft is now adding to this competitive pressure. Oracle now has the choice of eating humble pie and acquiring or building an MPP solution, or seeing the continuing erosion of its large-scale data warehousing market share as customers see the benefits of alternative solutions. My money is on Oracle burying its head in the sand and doing nothing.

The remaining question is whether Microsoft will succeed in using the Datallegro acquisition to penetrate high-end data warehousing. Given that SQL Server 2008 has just shipped, it means that the next release of SQL Server will not be for about another three years. A lot can happen in that time. Stuart Frost, the CEO of Datallegro, disputes this three year figure in blogs he has written, but Microsoft has a more rigid development, test, and release cycle than startups like Datallegro do. Even if the SQL Server MPP capability can be put into an interim release, it is unlikely we will see anything for two years.

Another issue concerns how Microsoft is managing its data warehousing development. It would appear that the BI team in Redmond has ceded Microsoft's data warehousing strategy to the SQL Server Group. In turn, the SQL Group has set up a data warehousing center of excellence at the Datallegro HQ in Southern California. Datallegro has limited data warehousing expertise, and for Redmond to have this remote group driving its data warehousing strategy appears very risky.

This brings me to the departure of Bill Baker to become CTO of Visible Technologies. This is a major loss for Microsoft. Bill is a visionary, an incredible leader, and was the driving force behind putting Microsoft on the BI and data warehousing roadmap. Microsoft has now lost two key BI and data warehousing people, Jim Gray and Bill Baker. They are irreplaceable, and setting up a data warehousing center of excellence in Southern California doesn't come close to making up for this loss.

]]> Wed, 13 Aug 2008 18:57:47 -0700
Thoughts from the Pacific Northwest BI Summit Scott Humphrey's Pacific Northwest BI Summit is my favorite event of the year. This is not only because it is at the Weasku Inn on the beautiful Rogue River in Oregon (which is only a 40 minute drive from Ashland where I live), but also because the vendors, consultants, and analysts who attend the event come together and share ideas in a unique way that is unlike any other event I attend during the year. Marketing hype and vendor competition are forgotten and everyone has down to earth formal and informal discussions on the state of the industry and its likely direction.

This year was the seventh year the event has been held and it surpassed even the excellence of previous summits. The four analysts and consultants (Jill Dyche, Claudia Imhoff, William McKnight, and myself) were joined by representatives from Composite Software, Dataflux, Eyeris, HP, IBM Cognos, Infocentricity, Microsoft, Paraccel, PivotLink, SAP Business Objects, Teradata, Xactly Corporation, and of course the BI Network.

The informal discussions covered a wide range of topics from BI to politics! The acquisition of Datallegro by Microsoft had just happened and this was a big topic of discussion. Although views varied, several people expressed the opinion that Microsoft was really buying an enterprise marketing position (especially against Oracle), rather than any real technology. By this time the industry blogging machine was working overtime and several blogs had already reported the purchase price to be $275 million, which staggered everyone.

Towards the end of the summit, news was leaking out that Bill Baker was leaving Microsoft and everyone agreed this was a tragic loss for the company. Microsoft is certainly going through some dramatic changes in the BI area.

The formal discussions focused on Software as a Service BI (led by Claudia Imhoff), CRM (led by Jill Dyche), Operational BI (led by myself) and IT Leadership (led by William McKnight). The volume of information and discussion is too lengthy to report here, but the BI Network will be releasing a number of podcasts on some of the discussions in the near future. Podcasts with each of the vendors are already available.

Some key points I got from these four sessions were:

1. There is considerable interest in SaaS BI by both vendors and customers. BI is being used not only by SMBs, but also groups within large organizations. SaaS BI is often used to get a project started and many companies would like to bring the project in house once it matures. Many people felt that the pay as your go model will gradually become the norm for both SaaS and on-premises solutions (as pointed out at the summit "on-premises" is correct English usage, but "on-premise" is not). Lastly, like in-house application packages in the past, SaaS companies and solutions will merge and be acquired to provide a set of application solutions, rather than remain as stand-alone silos.

2. CRM is going through a reemergence with companies focusing on micromarketing, social computing as a new CRM information source, and increased interest in master data management.

3. There was universal agreement that operational BI is a big growth area, but that the range of solutions and vendors both inside and outside BI is large and confusing. One point of discussion was the convergence of operational BI with business process management and complex event processing. Other discussions focused on the impact of operational BI being process driven, rather than data driven, and on whether BI is the best term to use to describe analytical and decision making solutions moving forward.

4. The discussion on IT leadership generated many different viewpoints. There was universal agreement that companies need to focus less on reducing IT costs, and more on recognizing IT as a essential business component of the organization, in the same way, for example, that HR is. There was also a lot of discussion about the need for IT to modernize its thinking, and create a more flexible governance environment to handle emerging technologies such as social computing.

The summit offered several opportunities to enjoy the many tourist activities of the Rogue River. The highlight for me was a visit to the Wildlife Images animal rehabilitation and education center. Everyone fell in love with a badger called Nubs with the result that the group donated $1,000 to support Nubs and other animals at the center.

I can't wait until next year!

]]> Wed, 06 Aug 2008 10:26:53 -0700
Microsoft Acquires Appliance Vendor DATAllegro: The Beginning of a Trend? In a surprise move today, Microsoft announced they are acquiring data warehouse appliance vendor DATAllegro. No purchase price was disclosed.

Stuart Frost, CEO of DATAllegro, had some interesting things to say about the acquisition on his blog immediately after the announcement. Here are some excerpts:

"... just as the VC community started to recover from the Internet ‘bubble’ in 2003, I came up with the vision for DATAllegro. Since that time, we’ve raised just under $65m in venture capital and created a hugely successful exit for my investors, my great team and last, but not least, me!"

"As soon as the acquisition closes, we'll start the work of moving our technology from Ingres & Linux to SQL Server and Windows. Our feasibility studies over the last few months indicate that SQL Server is a significant improvement in terms of performance - especially in key areas such as star joins, I/O throughput and in-memory operations. The engineering team here at DATAllegro is VERY excited about the next version of the product."

"Over the last few years, it's been incredibly frustrating to have prospects tell us that we have the best technology, vision and people, but that they can't buy from a startup - I think that will change radically under the Microsoft brand! As a result, I'm starting to think that it could be a long term home for me. It will certainly make a nice change from having to raise VC money every few months!"

"It will be interesting to see the impact this acquisition has on the rest of the market. My guess is that the other incumbents will scramble to respond to Microsoft's pre-emptive strike and that this could lead to a few of the other startups being acquired. The ones left out will find life very hard over the next few years."

There are a few interesting points to note in these quotes. The first is that DATAllegro will move from using an open source DBMS to SQL Server. Microsoft wants to move SQL Server up market to compete with the high-end DBMSs for data warehousing such as IBM DB2, Oracle and Teradata. It is interesting to note that at a Teradata analyst meeting this week, Teradata talked about how it is moving in the opposite direction of providing lower-cost and smaller appliances for data marts. Teradata hopes that this strategy will introduce Teradata to companies outside of the Fortune 2000 who will then hopefully move gradually to an enterprise data warehouse environment based on Teradata.

Before the acquisition, DATAllegro was clearly targeting Teradata customers by suggesting that they could offload some of their Teradata workloads to a lower-cost DATAllegro solution, i.e., DATAllegro was selling coexistence, rather than rip and replace. Obviously this strategy will change now that Microsoft is driving the marketing program.

The second point to note is that DATAllegro had burned though nearly $65 million in VC funding. Like many of startups they were achieving visibility, but not sales. In fact, it was difficult to find good DATAllegro case studies. To be fair, this is true for several of the other appliance vendors as well. Netezza, who has been in this market longer, is an exception here. I wonder if Microsoft perhaps took advantage of the VCs getting nervous about the viability of DATAllegro. We may get some insight into this when the purchase price is leaked.

I think Stuart's comments about other startups being acquired is correct. There are simply too many appliance vendors out in the market. The question is who is likely to acquire them? Leading DW DBMS vendors such as IBM, Oracle, and Teradata are unlikely candidates. One possibility is HP who are struggling to get NeoView off the ground. Another is Sun who have relationships with several appliance vendors. There is no question, however, that several of the appliance vendors are likely to go out of business. History shows that selling price/performance is not a good longterm strategy. Success may come for those appliance vendors that move toward selling more application-focused solutions.

Regardless, the DBMS and DW markets continue to be exciting!

]]> Thu, 24 Jul 2008 13:15:02 -0700
Organizations Need to Realize That Not All Web Content is Free There is no question that the Web has changed the way we consume information. This is because it provides us with fast access to a vast virtual information store. This information store has become so easy to access with modern search engines that we tend to assume that everything on the Web is free. This is not the case, and this is a potential minefield, not only for information publishers that wish to protect their intellectual property (IP) on the Web, but also for organizations that consume and use that IP.

]]> Wed, 30 Apr 2008 20:01:59 -0700