tricky question and every organisation is answering this differently.
It is however an important one. Those who owns the data are resposible
for it's quality, right? Not a light-harded question if you consider the compliancy pressure these days and the issue of clear responsibilities regarding the data. Those who own the data are responsible for
it's data whereever it is used within the organisation. Is the latter one really true?
In my article that I published in september 2008
I strongly advise to register authentic factual data in the Central
Data Warehouse. Business rules should be implemented downstream, after
the Central warehouse.
Who owns the data in the Central Data
Warehouse? Is it the BICC? No, they are not owner. Since 'we' store
authentic factual data in the Central Data Warehouse (we do not change,
enrich or integrate* it!) the owner of the data should still be the same
as the owner of the source. Let me put it more simple:
The people that create the authentic data also own the data, wherever it goes.
me just highlight the importance of architecture here. What would
happen if I made a classic/old style hub/spoke data warehouse. Where a
central data warehouse is developed in which the data is neatly
integrated, cleansed etc. In other words, data is changed on the way
into the Central Data Warehouse. Or maybe even changed on the way into
the staging area!! Well, taking into account the above rule, the data
warehouse team, that creates/changes the data, becomes the owner now.
result of the latter - more classic - data warehouse architectures is
that there are two owners of the data and that it's decoupled between
the data created by application and the data coming into the staging.
The above is a classic case in the majority of organisations. It
creates massive problems in governing your data warehouse, especially
your change management. What happens of a change occurs in the source
data? Is the source owner responsible for cascading the change to the
data warehouse? Well, he and the data warehouse team probably got some
kind of SLA stipulating the source owner to signal the data warehouse
team that a change is eminent.....The Data Warehouse team needs to go
to work. What happens if you have 50 sources, or even 100. What happens
if they are big changes....chaos and the sustainability of your data warehouse is in great risk.
So how does it look like in the new generation Enterprise Data Warehouses?
what happens if a change occurs in the data? The owner of the data is
going to do an impact analysis on all it's interfaces he owns and he is
responsible for. He is also responsible in the new generation EDW for
engineering the change up untill the central data warehouse!!! This is
a hugely more manageable governance model regarding change mangement in
an Enterprise Data Warehouse.
Is the source owner also owner of
data coming into the datamarts? Yes, this is however a tricky one.
Integration, cleansing is taking place downstream, between the Central
Data Warehouse and the datamarts. In the second article Lidwine van As
and I published in Database Magazine (november 2008) we state that this
part of the EDW is pushed by demand (where as data getting into the
Central Data Warehouse is pushed by supply); in other words, those who
demand put up the requirements; the business rules they want to apply
on the factual data.
In datamarts, data is changed according to rules defined by a user. Is the intial owner of the data still accountable for this data? Well...yes they are. But...The rules being used - going into the datamarts - are not their responsibility. If somebody comes up with a highly
fantastic rule for calculating turnover, I would say that Finance
should have the ownership on this formula. But I am embarking on a
whole other subject here. The subject of definition ownership...let's
not go there.
IT Artificact and boundary; Data Warehouse and Business Intelligence
As you can see in the above pictures, the boundaries of the IT system has changed. The boundary of the IT system is not the system itself. The boundary is determined by the propagation of it's data. The data warehouse is to be regarded as an integral part of the IS environment. You could also say that the data warehouse is evolving into a functional interface on top of the operational-transactional systems. This functional interface rationalizes the data structure of these systems and can thereby serve other functionalities like Business Intelligence. It does not have to be perse Business Intelligence! It can also be used for data sharing to third parties (e.g. co-making), data quality projects, operational control, accounting (remember; it's factual, auditable data), etc..
This shift in system-boundary is also acknowledging that building sustainable data structures differs from building succesfull Business Intelligence systems. They both differ in competencies and skills, in organisational design (Account management, Exploitation, Development, Maintenance), technical architecture (tools, version management, security, performance etc..) and cultural aspects.
A small sidenote on the metadata part, the business metadata
part in particular (definitions, domain values, etc..). I see a lot of
EDW architectures where the responsibility for the registration,
administration and publication of the business metadata is put on the
shoulders of the DWH team. In the above graphic of the new generation
EDW and considering this blog post, this is not the right approach.
Those that create the authentic data should also take care of the
business metadata. It's their job! And yes, this extends the Enterprise
Data Warehouse big time! This entails the Enterprise Data Management;
all data and probably also all services within the organisation. Meta
data Administration should be an existing function in any organisation
(remember the scale - I am not talking about midsize companies here!)
The Data Warehouse team however does have a responsibility for
registering, administring and publicing the business metadata for the
- Ownership of these rules/definition can transfer to a definition owner or the user that is asking for the data.
This is a blog post - so I am allowed not to be
thorough, scientificly correct etc...So, I am leaving out a lot of
nuances, restrictions and pre-requisites. Let me just give you a few.
There are some major Enterprise Architectural principles that need to
be considered here.
1 - Data must be decoupled fromt it's application/proces (huge one!!!)
2 - Ownership of data cascades all the way through it's use within the organisation
3 - The data warehouse hub MUST register authentic, factual data
- The data warehouse hub is to be designed in such a way that it
supports a federated deployment (worth a whole new blog post) - without
this one; forget it.
5 - Interfaces between source data and staging must be standardized
6 - Metadata Administration must be implemented enterprise wide
7 - Release management for CDW and datamarts need to be setup
8 - ...
Most of these pre-requisites I have written down in two articles which can be downloaded here.
Just a small brain dump I wanted to share with you guys, just give me your 2 cents on this one.
* a small level of data integration is necessary