Blog: Colin White Subscribe to this blog's RSS feed!

Colin White

I like the various blogs associated with my many hobbies and even those to do with work. I find them very useful and I was excited when the Business Intelligence Network invited me to write my very own blog. At last I now have somewhere to park all the various tidbits that I know are useful, but I am not sure what to do with. I am interested in a wide range of information technologies and so you might find my thoughts will bounce around a bit. I hope these thoughts will provoke some interesting discussions.

About the author >

Colin White is the founder of BI Research and president of DataBase Associates Inc. As an analyst, educator and writer, he is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies and how they can be used for building the smart and agile business. With many years of IT experience, he has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit and is a regular contributor to several leading print- and web-based industry journals. For ten years he was the conference chair of the Shared Insights Portals, Content Management, and Collaboration conference. He was also the conference director of the DB/EXPO trade show and conference.

Editor's Note: More articles and resources are available in Colin's BeyeNETWORK Expert Channel. Be sure to visit today!

September 2007 Archives

Search is beginning to gain interest in the business intelligence (BI) space, and several people have begun asking about the difference between querying data (using a traditional database query language) and browsing data using a search tool, and which approach to use when. This blog entry is an attempt to put a stake in the ground and to encourage a discussion on this topic.

Let's first discuss traditional database queries. In the BI environment, a BI tool or application issues a query against a database system. The query is in a formalized database language such as SQL (note that S stands for structured). A GUI may be used to hide the complexities of SQL. The results of the query can be browsed, reported on, analyzed, etc. The emphasis of this style of processing is on the analysis of structured data. It is not designed for the ad hoc browsing of information from a bunch of unrelated data sources.

Typically database query processing is formalized and standardized both in terms of the query language itself and the database structures accessed. There are natural language interfaces for doing this type of processing, but they simply convert the natural language requests into database language statements.

With database query processing you have to have some knowledge (i.e., metadata) about the structure of the data before you can access it. There has been a move to relax the structured approach of structured database processing by adding database structure and language support for accessing and analyzing semi-structured data such as XML.

When it comes to unstructured data there are two options. The first is to transform all or some of the unstructured data into a structured or semi-structured format, and store the transformed data in a database system (together with any remaining unstructured data that has not been converted). The transformed data can be associated with existing structured data in the database. This transformed data can then be processed by database queries, and any associated unstructured data retrieved as a part of the result set.

A second option for unstructured data is to access it directly using a search tool. With a search tool information is accessed using search queries. As with database languages, these queries can be generated from a GUI. The results from search queries can be browsed to find the data of interest. The search results can also be passed to an analysis tool for further processing.

Search languages are designed for accessing freeform unstructured data. Of course search queries can also access both structured and unstructured data. Search query languages are less complex than database query languages, because, unlike languages such as SQL, they are not designed for complex data retrieval and analysis.

To improve the accuracy of search queries, metadata can be extracted from the unstructured data using utilities supplied with the search tool, or by third-party vendors who offer taxonomy and information exploration tools. The better the metadata, the more accurate the search results are likely to be. The metadata adds semantic meaning to the unstructured data. In some cases, the metadata can be used to build faceted or taxonomy-driven search interfaces that use the metadata to filter the search results. The metadata can also be used to aid in the transformation of unstructured data into a semi-structured or structured format. It is these types of capability that separate enterprise search tools from internet search tools.

The bottom line is that the database query and the search query approaches are starting to come together. However, search queries are designed for the browsing of less formalized and unstructured information, whereas database queries are intended for the analysis of structured and semi-structured data. As discussed, unstructured information can be transformed into a semi-structured format, and search results can be further analyzed by analysis tools. Both approaches use some form of query language.

Posted September 12, 2007 11:38 AM
Permalink | No Comments |