Oops! The input is malformed!
Originally published 16 June 2009
Ronald would like to thank Lidwine van As, senior consultant in the field of information architecture and management at Grey Matter, for her contribution to this article.
The scope of this article is the design of development processes in a data warehouse environment in extremely large-scaled organisations. I recognize the scope of a data warehouse described by Claudia Imhoff and Colin White (August 2008, Full Circle: Decision Intelligence) as well as the continuing improvements in techniques such as appliances, virtualisation, etc.
However, as an extension to the Imhoff/White article where a data warehouse serves tactical and strategic analysis and supports operational BI,1 I believe that an enterprise data warehouse must be able to support other types of functionalities. An example of such functionality would be data sharing to third parties, data quality procedures and checks on source data that usually extend single operational sources.
This article – among others – is a reaction to several publications that have an extremely light-hearted view about data proliferation, the development of data products, maintenance aspects, data security and legal aspects.
If more background is needed on the architecture that is underlying this article, please visit http://www.prudenz.nl/nl/library where two additional articles can be downloaded that will give more colour to this article.
End user: I urgently need a summary for my manager.
Business intelligence competency centre (BICC): Are you able to produce one yourself, or would you like us to do so?
End user: I will do it myself.
BICC: Do you have access to the required data?
End user: I do for the urgent part, but for the next occasion, I am still missing data XYZ.
BICC: We have already included part of the data in our central data warehouse; what we do have will be ready by the end of this week; however, the information we do not have will first require analysis to trace its origins.
The above conversation illustrates that development of information products in business intelligence basic facilities may vary in speed. Furthermore, such development can be characterised in several ways. An initial summary can be produced by end users themselves – they will develop these themselves (locally) and will probably test the report as well or with the assistance of their their managers. The second and third instances require data production. This task will be (centrally) adopted by the BICC, but it involves longer run times. This is due to standards set by the BICC with respect to system production. The process requires thorough testing and proper assessment of capacity impact, in addition to the arrangement of version and configuration management for new data mart and loading procedures as well as meeting the demands of security policies. System production standards may be even more rigorous, as this requires a docking platform for OLTP environments.
This article addresses the way in which basic BI facilities can be used in response to matters related to business intelligence. In this context, account management, maintenance, exploitation and development are considered to be production core processes. Specifically, this article will elaborate on the way in which data and information products are developed. These concepts are placed in their proper context and described in detail. In addition, the article distinguishes between two fundamental modes of development and addresses the concentration and positioning of activities in the development process. Finally, I will briefly address a possible model of development and the way in which various ways of development may collaborate within a coordinated context.
The scale of an organisation is crucial with respect to the development process and its (standardised) arrangement. It is perhaps the most underestimated aspect of organizing an enterprise data warehouse (EDW) facility. Scale involves the notions of separation of functions, work process level of complexity, number of employees, amount of data and the way such data is used. As scale increases, so does the importance of capturing production of data and information products in a stricter regime and the significance of monitoring their governance. Failure to recognise these facts will jeopardise EDW facility sustainability. This is a delicate balance, however, in which the human measure must not be overlooked. After all, the human measure is what makes business intelligence a success.
The notion of development is usually automatically associated with application development. The results of an application development project are directly relevant to business and are highly recognisable as such. Further areas of development, such as infrastructure development, production-line development and development of production (monitoring) instruments, are only relevant to business in indirect ways. For this kind of development programmes, budgeting is often a complicated affair. Development of instruments that monitor production in particular is easily set aside as a long-term project, such as an application for support of error processing, or an application intended to make ETL production more transparent. This kind of secondary development is often difficult to grasp and account for, especially as they involve opaque blends of development of applications, infrastructure and production lines. Because of this, such instruments are often only developed at the point where problems arise during the exploitation stage. Trial and error will then lead to the conclusion that production is beyond control due to absence of vital instruments. In part, this is the nature of such affairs – it is very difficult to assess what the array of support instruments should consist of. This is why it is wise to apply an evolutionary development method to these kinds of facilities. This article will take a closer look at various ways of application development in a data warehouse and business intelligence context.
Different kinds of data and information products are manufactured using data from the EDW. Data products consist solely of content. This may be either raw data from an authentic source (i.e., central data warehouse) or enriched data (i.e., data marts). Information products represent combinations of content, functionality and presentation. Information products may include reports, online analysis, data mining, etc. Development of data and information product is performed at varying paces and, therefore, involves varying delivery times as indicated in Figure 1. An important aspect of manufacturing data and information products in a next-generation EDW is to acknowledge that as the business intelligence basic facility operation time increases, speed of delivery increases accordingly.
During the initial stage of the business intelligence basic facility, emphasis will be on speed #1 – releasing source data to the central data warehouse. However, as data starts to accumulate and the central data warehouse starts to fill up with data, speeds #2 and #3 will suffice to ever greater degrees.
Figure 1: Varying Paces of Production
With respect to informational products, end users will benefit particularly from as much manoeuvrability variation as possible (suiting their purposes). The end user will want to be able to quickly adjust, improve or reinvent products. Furthermore, different categories of end users will set different standards for manoeuvrability. Finance, for example, will probably apply a more rigorous testing process than marketing. Moreover, informational products such as these are characterised by numerous individual differences and particularities (for example, unique hierarchies, grouping and dimensionality).
By contrast, development of data products is characterised by quality attributes such as delivery reliability, correctness, timeliness, security and availability. More importantly, the main characteristic of data products is that they are data are made available to the entire organisation.
The relationship between features of manoeuvrability and individual variation of information products versus reliability and mass use of data products are strained. The next generation of EDW will therefore be able to distinguish between “Soviet-like” control of data product development2 and more “chaotic” development of information products.3 Data products will have to meet all standards familiar to the field of system development. In developing information products, a substantial degree of freedom will be allowed to various categories of end users. A mild type of chaos is allowed here. A crucial part is played by the BICC in setting out and maintaining the rules of play intended to contain such allowed chaos.
Application development can be approached from two perspectives: 1) style of development, and 2) concentration and positioning of development activities.
Style of development
Development may either be of a systematic or of an opportunistic nature. In systematic system development, development is guided by a coordinating architectural description. Long-term interests are taken into account to the greatest possible extent as sustainability is important. A great deal of attention is paid to non-functional standards, such as recyclability, performance and overall conceptual consistency. Strong coherence and the mutual dependencies resulting from it call for centralised control of development activities. As a result, distance to customers is considerable. Stable requirements set serve to formally communicate the intended result. There is much attention for management of versions and configuration, testing and acquiring documentation for maintenance support. Production of functionality is staged (development – testing – acceptance – production), and transfer is at a single person’s discretion. System development can be arranged using a relatively large-scale format. This systematic approach is the only feasible option for developing large and complex systems.
Emphasis in opportunistic system development is on different elements altogether. Its objectives are quick results and direct user influence over those results. If solutions do not suffice, adjustments are rapidly implemented. If such adjustments are impossible, a new variant is produced. There is no need for a coordinating architecture to provide to guide the process. Communication is largely informal, as developers have access to most of the available expertise on the subject or have direct access to end users during development activities. A heavily staged production method (OTAP) is unnecessary, as it will only serve to slow down the process. Practical experience has shown that large-scale development results in bureaucracy. Scale of development should therefore remain relatively small to keep communication and coordination overhead at a minimum. Usually, development will take place within the line; there is no need to work by means of projects. Extensive specialisation of staff is not required, as tasks are relatively broad in scope. Stable and formally agreed upon requirements are not needed; functionality is able to grow organically to some extent as new insights are gained.
System developers are employees trained in systematic system development. Users involved in system development adopt an opportunistic approach. In a BI practice, a blend of the development modes listed below will generally arise.
The model shown in Figure 2 relates the development modes described to each other.
Figure 2: Systematic versus Opportunistic Modes of Development
Concentration and positioning of activities
Within larger organisations, careful consideration of concentration and positioning of development activities become increasingly important. By bringing people together either physically or by means of a task force, a process of mutual adaptation will arise automatically. Communication and coordination costs are kept low, and negotiations are mutually attuned more or less automatically. So, for example, if the objective is to enforce some order throughout the organisation, as is the case with an EDW, relevant activities should be concentrated on a single point within the organisation. If the intention is to fully take local circumstances into account, as is the case with development of information products, spreading of development activity is preferred, as well as minimising the distance to users. Architectural ambitions and arrangement of the organisation must be well attuned to each other. Failure to do so will result in organisation issues being shifted towards architecture or vice versa.
For large-scale organisations, front-end development concentrated on a single location is hard to imagine. Communication and coordination costs will rise to levels preventing quick responses to local circumstances. Indeed, the ability to rapidly address new demands as experience accumulates is an essential feature of BI. A certain level of autonomy is therefore prerequisite. This will also serve as a motivating factor for local employees.
Figure 3 provides a suggestion for a development model within the context of next-generation EDW.
Figure 3: Business Intelligence Basic Facility Development Model
The various styles of development employed both locally and centrally set high standards for modes of cooperation between the parties. Product and process standardisation, and their monitoring, are essential aspects of arrangement here.
Front-end development standardisation (information products)
Standardisation may serve to compel a certain order. Standards may either be set for products or for processes. Locally coordinated unit front-end development is highly autonomous in its manufacturing of products. This is why front-end products are ill suited for standardisation. The maximum that can be desired is standardisation of layout. Attempts at standardising within the front-end environment should be directed at achieving a uniform local development process as well as the arrangement of its underlying infrastructure. This starts with acknowledging standard roles to which everyone conforms. Secondly, it is important to arrive at agreements on ways of testing and management. The latter will become important as adjustments to back-end affairs may affect reports, presupposing coordinating action in which the back-end team and several front-end teams jointly bring about a specific change. If all people involved adopt different methods, this will not succeed. The ability to adapt to changes within the back end is a critical factor for success. Without this ability, introduction of iterative development to the data mart becomes a risky affair.
Standardisation in back-end development (data products)
Back-end development involves standardisation of products and processes alike. Applied systematic development introduces all sorts of process standards, and data structures require a design that is as uniform as possible. An important reason for this is that the impact of errors in data production on the organisation can take on vast proportions. Nonquality costs are high.
At a local level, product development is released gradually. This implies a certain degree of unpredictable occupation of means with respect to infrastructure. This is present in delegated development in a mild form and in an extreme form in self-service development. This may potentially result in decreased performance of the entire infrastructure. Within this context, two variants may be conceived:
In the latter case, resources (CPU cycles in particular) are shared, and optimum division of resources can be attained. This is quite demanding with respect to infrastructural arrangement, however.
The option of infrastructural compartmenting – irrespective of whether this occurs a priori – poses a complex (technical) business intelligence tooling puzzle (separation of analysis services), which also includes operating system processes (workload management) and scalability aspects (in the event of additional capacity requirements, technical scaling up is needed).
A business intelligence facility should provide the organisation with (delivery) reliable data, combined with highly demand-oriented – and therefore manoeuvrable – development of information products. This is precisely the combination which should be designed and which makes development within a business intelligence production so unique: a delicate puzzle of infrastructure arrangement, architecture standards, process modelling and organisation arrangement. No small feat indeed and may be considered complex – as the organisation’s scale increases – and which will require some stamina, but which will provide solid ground for the success of the business intelligence basic facility.