Search Technology for the Enterprise

Originally published 27 April 2009

Today’s workers are used to “Googling” queries and getting results quickly and accurately. However, while searching at work, these same workers often find it difficult to find internal documents with the same speed and efficiency. Searching for a document in the workplace usually involves sifting through multiple pages of search results, which wastes time and money. And, because enterprise users are searching for specific information, not just the most popular answer, they expect more precise search results than they would get from the Internet. Simply put, the techniques that work well on the Web aren’t as well suited to enterprise search.

A study by the Association for Information and Image Management (AIIM) noted that, “49% of survey respondents ‘agreed’ or ‘strongly agreed’ that it is a difficult and time-consuming process to find the information they need to do their job.”

All search, whether on the Internet or an enterprise, is powered by metadata, or the data about data.  Metadata is traditionally known as being the recorded information describing the different parts of data such as names, sizes, lengths, etc. 

The reason that finding information on the Internet is faster and more accurate than on an enterprise site is because hyperlinks provide the Internet with naturally occurring, high quality metadata. Metadata is generated through Internet hyperlinking. Each time someone links text to a web page, the linked text is interpreted by Internet search engines as metadata about this particular page, thus impacting a page’s ranking on the web search results.
However, searching on an enterprise site is a more difficult and laborious process because of the lack of metadata.  Unlike the Internet, there are no textual links between documents in an enterprise, and no implicitly created metadata that a search engine can use. Office documents are not naturally linked together, and there are too few corporate librarians assigned to manually tag each office document with the appropriate metadata.  Also, an average enterprise has more content types, formats and security measures than the Internet, which makes this search more time-consuming. The result is that workers spend too much time sorting through pages and pages of irrelevant results as opposed to executing the tasks associated with their job. 

The bottom line for enterprises with large amounts of data: It’s worth investing in the right search technology. The solution should automatically categorize enterprise content, thereby eliminating time-intensive manual categorization. In an organization with hundreds or even thousands of workers outputting knowledge, it could take years to tag each employee’s electronic documents by hand.

By automatically tagging and categorizing enterprise content, enterprises will realize the naturally occurring, high quality metadata associated with hyperlinks and bridge the gap between enterprise and Internet search. 

In order to achieve better enterprise search, enterprises should:

  • Install a system to automate the creation of metadata for existing content, and new content as it’s added to the server.

  • Categorize information into logical groups based on folksonomies, taxonomies and ontologies.

  • Define what you want to understand from the documents in advance, and check to see that automated systems coincide with these goals. Perform a systematic human check of your automated search and content management tools at least once per quarter.
The key to managing your company’s content quickly and easily is being able to automatically generate metadata. An auto-categorization metadata system, the backbone of a successful content management system (CMS), is a proven solution for better search and retrieval. It not only improves accuracy and efficiency, but also saves time, money and resources. In today’s challenging economic climate, it’s hard to argue with that.

For more information regarding enterprise and Internet search, please visit the Gilbane Group and CMS Watch.

SOURCE: Search Technology for the Enterprise

  • Yves Schabes
    Dr. Yves Schabes co-founded multilingual natural language technology company Teragram, a division of SAS, with Dr. Emmanuel Roche in 1997. Dr. Schabes has spent the past fifteen years working on issues relating to natural language processing and computer science. He is the author, or editor, of more than fifty international scientific publications, including co-editor, with Roche, of Finite-State Language Processing (1997, MIT Press, Cambridge MA). Dr. Schabes also is an Associate to the Division of Applied Science, Harvard University, Cambridge, MA. Prior to founding Teragram, Dr. Schabes was a Senior Scientist at Mitsubishi Electric Research Laboratories in Cambridge, MA. He also held a position as a Research Associate at the University of Pennsylvania. Dr. Schabes has been a program committee member of many international scientific conferences and journals. He received a Ph.D in 1990 in Computer Science from University of Pennsylvania, Philadelphia, PA.



Want to post a comment? Login or become a member today!

Be the first to comment!