Blog: William McKnight Subscribe to this blog's RSS feed!

William McKnight

Hello and welcome to my blog!

I will periodically be sharing my thoughts and observations on information management here in the blog. I am passionate about the effective creation, management and distribution of information for the benefit of company goals, and I'm thrilled to be a part of my clients' growth plans and connect what the industry provides to those goals. I have played many roles, but the perspective I come from is benefit to the end client. I hope the entries can be of some modest benefit to that goal. Please share your thoughts and input to the topics.

About the author >

William is the president of McKnight Consulting Group, a firm focused on delivering business value and solving business challenges utilizing proven, streamlined approaches in data warehousing, master data management and business intelligence, all with a focus on data quality and scalable architectures. William functions as strategist, information architect and program manager for complex, high-volume, full life-cycle implementations worldwide. William is a Southwest Entrepreneur of the Year finalist, a frequent best-practices judge, has authored hundreds of articles and white papers, and given hundreds of international keynotes and public seminars. His team's implementations from both IT and consultant positions have won Best Practices awards. He is a former IT Vice President of a Fortune company, a former software engineer, and holds an MBA. William is author of the book 90 Days to Success in Consulting. Contact William at

Editor's Note: More articles and resources are available in William's BeyeNETWORK Expert Channel. Be sure to visit today!

Recently in Data Warehouses Category

In "Top 10 Largest Databases in the World", the author covers the top 10 largest databases in the world. Their sizes are impressive indeed.

Speaking of large databases, I recently commented in an article that "The amount of information uptake into a corporation when RFID is implemented can be unprecedented. The largest data stores in the world soon will be in manufacturing and will comprise mostly item movement data."

Technorati tags: Data warehouse, RFID, DBMS

Posted November 1, 2007 2:35 PM
Permalink | No Comments |

To say you should “pick” your data warehouse executive sponsor carefully would be a rather strange statement. Not many DW programs can pick their sponsor. Usually, it’s the other way around. The program staff must deal with the sponsor that has chartered that course of action for the organization. DW programs need to have that top down driver and support at the executive level. Nonetheless, the executive sponsor acceding to his or her roles is critical to success and hopefully those roles are clear or there is the opportunity to make them clear.

I have worked with all manner of executive sponsor, from those who intuitively get it and put forward the effort required for success to those who need some guidance. Most are more than willing to listen and align their actions with successful practices for DW success. Generally, I ask my sponsors to invest in understanding DW systems generally and within their company; lead, as necessary, the governance meetings; provide overall direction for the DW and keep the DW out of internal cross-fire.

I’ve stopped asking them to define data elements and transformation rules!

Technorati tags: Data Warehouse,

Posted October 3, 2007 1:46 PM
Permalink | No Comments |

End clients generally end up overrating the usability of packaged data models, whether as stand-alone models or those models inherent in their software packages. Sooner or later, some amount of what I call ‘original data modeling’ is going to be necessary in any enterprise. Modeling expertise is a must-have though companies go to enormous lengths sometimes to avoid it. I don’t advocate necessarily changing packaged models in applications. However, no untouched application models, or combinations thereof, are going to be completely sufficient for a data warehouse in Fortune clients.

Technorati tags: data warehouse, data model

Posted September 26, 2007 7:09 PM
Permalink | No Comments |

Time for the Wednesday (or thereabouts) “What” (what I have learned…). OK, I seem to be endlessly prompted in my client work with these learnings so there’s no shortage of them, but sometimes I don’t have an elegant preamble to a blog entry. So, I’ll just say it.

You’ve got to tie that warehouse data back to source or users will cry foul. It doesn’t matter how dirty the source data is. If you want to change the data en route to the warehouse to clean it, fine, change it, but bring the original data as well in a different set of columns in order to prove your tie-out.

Tie-out should make you more comfortable with your ETL as well. It sometimes involves adding pre-extract queries to the source data and post-load queries to the warehouse data. It sometimes involves ‘spot’ query checks, which can get tricky. I.e., the method used to pick your spot data can come under scrutiny. It also gets tricky when the ETL is run intra-day or real-time, when ETL cycles are at an absolute premium. However, you still need to do it IMO. These tie-out results go in your operational metadata.

Tie-out is part of weaning users from their old ways to the new way (the data warehouse way). It’s part of the bottoms-up approach to a successful data warehouse rollout. Ask key users what they will use to deem the warehouse effort successful – and do that and more. Remember, users are from Missouri - the show-me state - and IT is from Mars (according to many users I have dealt with.) And if they don’t ask about tie-out, do it anyway!

Technorati tags: data warehouse, ETL

Posted September 20, 2007 7:32 AM
Permalink | No Comments |

Here’s one more on the theme of what I’ve learned. You’ve heard of Don’t Mess with Texas? Well, how about Don’t Mess with Excel!? Users love the sense of control over the data and the ability to perform their own calculations. (Other) BI tools will not sweep Excel out of any enterprise. However, precisely because of its flexibility, Excel is notorious as a source system for data warehousing and its applications need to get into a DBMS to serve the organization in that capacity.

DW teams often need to start as data providers to the organization, where knowledge workers from all over will pull for spreadsheets. Eventually, through value propositions (see previous ‘what I’ve learned’), those teams should increase their service to where they are providing BI and running the business, not just reporting on it. The Excel issue is a flashpoint for the whole ‘why do we exist’ issue for DW teams.

Technorati tags: data warehouse, Excel

Posted September 5, 2007 9:23 AM
Permalink | No Comments |

In no particular order, I’m going to be addressing this topic in a series of blog entries, starting with the approach to the build.

While a top down approach may seem ideal, data warehouses get built bottoms-up. The best data warehouses are built bottoms-up, but the worst data warehouses are built extreme bottoms-up. By extreme, I mean without any sense of where it’s all going, costing, best practices or where the ROI is going to come from. Like a virus growing within the organization, so the data warehouse expands to encompass other random and redundant data, becoming important enough to keep around, but with an organization that’s never sure why and with increasing concern about what it doesn’t do. Eventually, it gets redone until enough top-down is inserted into the process to make it usable. So, in other words, injecting some top-down elements into data warehousing is essential, but don’t believe it’s going to be complete top-down.

Technorati tags: data warehouse

Posted August 28, 2007 8:11 AM
Permalink | 1 Comment |

In the Data Warehousing industry, we are continuing to see the maturation of the value proposition and the management of risk. In the early days, the technology was experimental. Data Warehouse projects consumed $millions on nothing more than the promise of “if we build it, I’m sure it will pay for itself. After all, XYZ company found out something that caused their warehouse project to pay for itself in only six months!” Vendors were great at sending the message that “all of your competitors are building these systems in secret, because they consider it to be a competitive advantage. We would share more information, but we are under non-disclosure.”

The promise of striking gold in them thar hills of data was the subject of serious boardroom conversations. And those that failed to achieve the promise, either because the system was never built, or because it was delivered late and way over budget, or because they didn’t find the nuggets of gold they had hoped for, kept quiet. They didn’t want their colleagues or competitors to know.

Now it is generally known that Data Warehouse projects can fail, and have failed, and as a result, less of them actually do fail. We understand the risks and how to manage them.

Here are several of the factors that have contributed to our ever-increasing success:

• Adoption of an iterative deliverable methodology, where large projects are divided into 90-day deliverables and the projects with the greatest ROI and highest probability of success are done first. Scalable technology has contributed significantly to minimizing the risk in up-front capital investments.
• Dealing with the understanding that data quality is a major and must be evaluated up front, often times as part of an assessment. You can’t make a gourmet dinner out of garbage.
• An understanding that organizations must cooperate in order to integrate data, that project teams must be organized and executive sponsors identified accordingly.
• The technology to build, maintain, manage, and mine the systems is much better, and there are many more experienced technologists available.

Posted July 25, 2007 3:38 PM
Permalink | No Comments |

I'm getting concerned about the data warehouse. It has served us well, but can the current profile of data warehouses out there handle the next 10 years or will widespread changes be necessary? Consider that most data warehouses out there are not best practices by definition and are therefore dumps of operational data where history collects and reports are run from. This only solves some of the challenges associated with going it alone with just operational data, which are:

Data access
Reporting capabilities
Concurrency between query and operational needs
Structure for data access
Data quality for data access
Data integration
Storage of history data

Notably, it is the concurrency and history issues that instigate many data warehouse programs. However, integration is largely limited to data sharing a common database instance - which is good, but leaves too much complexity to the data access layer, where the end users find the data access tools too complex already. Building summaries and making sense of the data warehouse structure and data, especially without metadata, which most DW lack adequate levels of, is exasperating so current users mostly skim the surface of their true needs.

Also, data quality is only addressed in data warehouse programs out there selectively. Many remain afraid to change operational data, even if it is wrong. It needs to be fixed operationally anyway, and that just isn't happening enough.

So, how is data warehousing supposed to fit into this new world of data explosion, real-time requirements and a need for process-orientation?

1. We can't continue to delay the calculation, assimilation and distribution of master data until the data warehouse
2. Business intelligence, as a discipline, must be extended beyond reporting and even dashboarding and get involved in business processes using enterprise information integration and operational business intelligence approaches; these open up the possibilities beyond post-operational, after-the-fact BI
3. We need to embed business intelligence in operational processes and try a lot harder to fix data quality in the operational environment; the longer action is delayed, the less valuable it is; this can be the equivalent value of thousands of end-user data access licenses

This world requires integration between business units. It requires the understanding that information is a most-important business asset.

Of course, we could improve our data warehouses too with data quality, metadata, deriving data and true integration. In reality, for most, this is needed as well as a change in direction that focuses on the augmentation of the data warehouse with these new concepts. Most data warehouse programs will see these changes come one way or another in the next few years.

Technorati tags: data warehouse, information management, enterprise informatoin integration

Posted March 16, 2007 1:00 PM
Permalink | 2 Comments |

I have heard a lot about, though not worked on, the federal government's data warehouses. Some things are pretty clear. It is (they are?) large. This article cites 659 million records in the FBI's database. Look at the data sources - FBI records and criminal case files, Treasury, State and Homeland Security departments and the Federal Bureau of Prisons - more than 50 FBI and other government agency sources.

And look also at the benefits of data integration - 32,222 hours for a query down to 30 minutes or less.

There are 13,000 users and 1 million queries per month.

Actually, the article, written by a Washington Post Staff Writer, reads like any other data warehouse success story or case study - except those aren't bread and milk purchase transactions in there. CRM = Citizen Relationship Management anybody?
Technorati tags: Data Warehouse

Posted December 19, 2006 9:04 AM
Permalink | 1 Comment |

I was just thinking about what the unique realities of data warehousing today are. As I see it, the top realities are:

• Multiple, complex applications serving a variety of users
• Exploding data size that will continue to explode with RFID, POS, CDR, and all manner of transactional data extending back years into history
• Data latency is becoming intolerable as needs demand real-time data
• A varied set of data access tools, serving a variety of purposes, for each data warehouse
• Multiple workloads streaming into the data warehouse from varied corners of the company as well as from outside the company
• A progression towards more frequent, even continuous, loading
• Data types running the gamut beyond traditional alphanumeric types

Posted October 10, 2006 9:05 PM
Permalink | 2 Comments |

Search this blog
Categories ›
Archives ›
Recent Entries ›