Oops! The input is malformed! Development Processes in Data Warehouse Environments by Ronald Damhof - BeyeNETWORK Netherlands


 

Development Processes in Data Warehouse Environments

Originally published 16 June 2009

Ronald would like to thank Lidwine van As, senior consultant in the field of information architecture and management at Grey Matter, for her contribution to this article.

The scope of this article is the design of development processes in a data warehouse environment in extremely large-scaled organisations. I recognize the scope of a data warehouse described by Claudia Imhoff and Colin White (August 2008, Full Circle: Decision Intelligence) as well as the continuing improvements in techniques such as appliances, virtualisation, etc.

However, as an extension to the Imhoff/White article where a data warehouse serves tactical and strategic analysis and supports operational BI,1 I believe that an enterprise data warehouse must be able to support other types of functionalities. An example of such functionality would be data sharing to third parties, data quality procedures and checks on source data that usually extend single operational sources.

This article – among others – is a reaction to several publications that have an extremely light-hearted view about data proliferation, the development of data products, maintenance aspects, data security and legal aspects.

If more background is needed on the architecture that is underlying this article, please visit http://www.prudenz.nl/nl/library where two additional articles can be downloaded that will give more colour to this article.

Introduction

End user: I urgently need a summary for my manager.
Business intelligence competency centre (BICC): Are you able to produce one yourself, or would you like us to do so?
End user: I will do it myself.
BICC: Do you have access to the required data?
End user: I do for the urgent part, but for the next occasion, I am still missing data XYZ.
BICC: We have already included part of the data in our central data warehouse; what we do have will be ready by the end of this week; however, the information we do not have will first require analysis to trace its origins.

The above conversation illustrates that development of information products in business intelligence basic facilities may vary in speed. Furthermore, such development can be characterised in several ways. An initial summary can be produced by end users themselves – they will develop these themselves (locally) and will probably test the report as well or with the assistance of their their managers. The second and third instances require data production. This task will be (centrally) adopted by the BICC, but it involves longer run times. This is due to standards set by the BICC with respect to system production. The process requires thorough testing and proper assessment of capacity impact, in addition to the arrangement of version and configuration management for new data mart and loading procedures as well as meeting the demands of security policies. System production standards may be even more rigorous, as this requires a docking platform for OLTP environments.

This article addresses the way in which basic BI facilities can be used in response to matters related to business intelligence. In this context, account management, maintenance, exploitation and development are considered to be production core processes. Specifically, this article will elaborate on the way in which data and information products are developed. These concepts are placed in their proper context and described in detail. In addition, the article distinguishes between two fundamental modes of development and addresses the concentration and positioning of activities in the development process. Finally, I will briefly address a possible model of development and the way in which various ways of development may collaborate within a coordinated context.

Organisation Scale

The scale of an organisation is crucial with respect to the development process and its (standardised) arrangement. It is perhaps the most underestimated aspect of organizing an enterprise data warehouse (EDW) facility. Scale involves the notions of separation of functions, work process level of complexity, number of employees, amount of data and the way such data is used. As scale increases, so does the importance of capturing production of data and information products in a stricter regime and the significance of monitoring their governance. Failure to recognise these facts will jeopardise EDW facility sustainability. This is a delicate balance, however, in which the human measure must not be overlooked. After all, the human measure is what makes business intelligence a success.

Development

The notion of development is usually automatically associated with application development. The results of an application development project are directly relevant to business and are highly recognisable as such. Further areas of development, such as infrastructure development, production-line development and development of production (monitoring) instruments, are only relevant to business in indirect ways. For this kind of development programmes, budgeting is often a complicated affair. Development of instruments that monitor production in particular is easily set aside as a long-term project, such as an application for support of error processing, or an application intended to make ETL production more transparent. This kind of secondary development is often difficult to grasp and account for, especially as they involve opaque blends of development of applications, infrastructure and production lines. Because of this, such instruments are often only developed at the point where problems arise during the exploitation stage. Trial and error will then lead to the conclusion that production is beyond control due to absence of vital instruments. In part, this is the nature of such affairs – it is very difficult to assess what the array of support instruments should consist of. This is why it is wise to apply an evolutionary development method to these kinds of facilities. This article will take a closer look at various ways of application development in a data warehouse and business intelligence context.

Variations in Pace of Application Development

Different kinds of data and information products are manufactured using data from the EDW. Data products consist solely of content. This may be either raw data from an authentic source (i.e., central data warehouse) or enriched data (i.e., data marts). Information products represent combinations of content, functionality and presentation. Information products may include reports, online analysis, data mining, etc. Development of data and information product is performed at varying paces and, therefore, involves varying delivery times as indicated in Figure 1. An important aspect of manufacturing data and information products in a next-generation EDW is to acknowledge that as the business intelligence basic facility operation time increases, speed of delivery increases accordingly.

During the initial stage of the business intelligence basic facility, emphasis will be on speed #1 – releasing source data to the central data warehouse. However, as data starts to accumulate and the central data warehouse starts to fill up with data, speeds #2 and #3 will suffice to ever greater degrees.

alt

Figure 1: Varying Paces of Production

With respect to informational products, end users will benefit particularly from as much manoeuvrability variation as possible (suiting their purposes). The end user will want to be able to quickly adjust, improve or reinvent products. Furthermore, different categories of end users will set different standards for manoeuvrability. Finance, for example, will probably apply a more rigorous testing process than marketing. Moreover, informational products such as these are characterised by numerous individual differences and particularities (for example, unique hierarchies, grouping and dimensionality).

By contrast, development of data products is characterised by quality attributes such as delivery reliability, correctness, timeliness, security and availability. More importantly, the main characteristic of data products is that they are data are made available to the entire organisation.

The relationship between features of manoeuvrability and individual variation of information products versus reliability and mass use of data products are strained. The next generation of EDW will therefore be able to distinguish between “Soviet-like” control of data product development2 and more “chaotic” development of information products.3 Data products will have to meet all standards familiar to the field of system development. In developing information products, a substantial degree of freedom will be allowed to various categories of end users. A mild type of chaos is allowed here. A crucial part is played by the BICC in setting out and maintaining the rules of play intended to contain such allowed chaos.

Analysis of Development Modes

Application development can be approached from two perspectives: 1) style of development, and 2) concentration and positioning of development activities.

Style of development

Development may either be of a systematic or of an opportunistic nature. In systematic system development, development is guided by a coordinating architectural description. Long-term interests are taken into account to the greatest possible extent as sustainability is important. A great deal of attention is paid to non-functional standards, such as recyclability, performance and overall conceptual consistency. Strong coherence and the mutual dependencies resulting from it call for centralised control of development activities. As a result, distance to customers is considerable. Stable requirements set serve to formally communicate the intended result. There is much attention for management of versions and configuration, testing and acquiring documentation for maintenance support. Production of functionality is staged (development – testing – acceptance – production), and transfer is at a single person’s discretion. System development can be arranged using a relatively large-scale format. This systematic approach is the only feasible option for developing large and complex systems.

Emphasis in opportunistic system development is on different elements altogether. Its objectives are quick results and direct user influence over those results. If solutions do not suffice, adjustments are rapidly implemented. If such adjustments are impossible, a new variant is produced. There is no need for a coordinating architecture to provide to guide the process. Communication is largely informal, as developers have access to most of the available expertise on the subject or have direct access to end users during development activities. A heavily staged production method (OTAP) is unnecessary, as it will only serve to slow down the process. Practical experience has shown that large-scale development results in bureaucracy. Scale of development should therefore remain relatively small to keep communication and coordination overhead at a minimum. Usually, development will take place within the line; there is no need to work by means of projects. Extensive specialisation of staff is not required, as tasks are relatively broad in scope. Stable and formally agreed upon requirements are not needed; functionality is able to grow organically to some extent as new insights are gained.

System developers are employees trained in systematic system development. Users involved in system development adopt an opportunistic approach. In a BI practice, a blend of the development modes listed below will generally arise.

  • Self-service development: Employees go about putting solutions “in place” in opportunistic ways to suit their own purposes or those of their colleagues. The results arrived at can be used immediately without any physical transfer. There is no role distinction between developer and user; anyone can make adjustments, implying an environment shared by all. This mode of operation is common in the field of office automation. Data analysts and data miners usually work by means of self-service. The degree of unpredictable occupation of means is greatest in this mode of development.
  • Delegated development: Locally operating employees construct solutions in an isolated environment, which are used by their immediate colleagues. A transfer mechanism operated directly by the developer puts the results into production right from their development environment. A distinction of roles arises between the developer and users, who cooperate locally.
  • IT development: Employees trained in IT build solutions in systematic ways, using development disciplines familiar in their line of business.

The model shown in Figure 2 relates the development modes described to each other.

alt

Figure 2: Systematic versus Opportunistic Modes of Development

Concentration and positioning of activities

Within larger organisations, careful consideration of concentration and positioning of development activities become increasingly important. By bringing people together either physically or by means of a task force, a process of mutual adaptation will arise automatically. Communication and coordination costs are kept low, and negotiations are mutually attuned more or less automatically. So, for example, if the objective is to enforce some order throughout the organisation, as is the case with an EDW, relevant activities should be concentrated on a single point within the organisation. If the intention is to fully take local circumstances into account, as is the case with development of information products, spreading of development activity is preferred, as well as minimising the distance to users. Architectural ambitions and arrangement of the organisation must be well attuned to each other. Failure to do so will result in organisation issues being shifted towards architecture or vice versa.

For large-scale organisations, front-end development concentrated on a single location is hard to imagine. Communication and coordination costs will rise to levels preventing quick responses to local circumstances. Indeed, the ability to rapidly address new demands as experience accumulates is an essential feature of BI. A certain level of autonomy is therefore prerequisite. This will also serve as a motivating factor for local employees.

Development model

Figure 3 provides a suggestion for a development model within the context of next-generation EDW.

alt

Figure 3: Business Intelligence Basic Facility Development Model

  • Centrally coordinated infrastructure development. This entails providing arranged platform capacity and BI system software for both the back end and front end, in addition to a portal environment for offering developed web functionality to clients.
  • Locally coordinated front-end development. A local unit accommodating realisation of application functionality for part of the organisation. Units are positioned as close to primary processes as possible, in order to warrant “customer intimacy” and to allow for efficient, suitable front-end development. Two  modes of development coexist within local units; self-service development and delegated development.
  • Centrally coordinated EDW development. All IT new construction projects and application management activities related to source release, central data warehouse and data marts constructed from it.

Development “in Control”

The various styles of development employed both locally and centrally set high standards for modes of cooperation between the parties. Product and process standardisation, and their monitoring, are essential aspects of arrangement here.

Front-end development standardisation (information products)

Standardisation may serve to compel a certain order. Standards may either be set for products or for processes. Locally coordinated unit front-end development is highly autonomous in its manufacturing of products. This is why front-end products are ill suited for standardisation. The maximum that can be desired is standardisation of layout. Attempts at standardising within the front-end environment should be directed at achieving a uniform local development process as well as the arrangement of its underlying infrastructure. This starts with acknowledging standard roles to which everyone conforms. Secondly, it is important to arrive at agreements on ways of testing and management. The latter will become important as adjustments to back-end affairs may affect reports, presupposing coordinating action in which the back-end team and several front-end teams jointly bring about a specific change. If all people involved adopt different methods, this will not succeed. The ability to adapt to changes within the back end is a critical factor for success. Without this ability, introduction of iterative development to the data mart becomes a risky affair.

Standardisation in back-end development (data products)

Back-end development involves standardisation of products and processes alike. Applied systematic development introduces all sorts of process standards, and data structures require a design that is as uniform as possible. An important reason for this is that the impact of errors in data production on the organisation can take on vast proportions. Nonquality costs are high.

Exploitation “in Control”

At a local level, product development is released gradually. This implies a certain degree of unpredictable occupation of means with respect to infrastructure. This is present in delegated development in a mild form and in an extreme form in self-service development. This may potentially result in decreased performance of the entire infrastructure. Within this context, two variants may be conceived:

  • A priori infrastructural compartmenting of self-service development. In other words: isolating self-service with respect to infrastructure.4 Each locally coordinated unit has agreed upon a service level with centrally developed infrastructure and architecture. This will contain agreements on occupation of means freely available to local units, for example. If these are exceeded,5 new (price) agreements are made.
  • The option of applying the principle of gradual priority degradation. As the requested query continues to operate, the relevant query is attributed decreasing priority.

In the latter case, resources (CPU cycles in particular) are shared, and optimum division of resources can be attained. This is quite demanding with respect to infrastructural arrangement, however.

The option of infrastructural compartmenting – irrespective of whether this occurs a priori – poses a complex (technical) business intelligence tooling puzzle (separation of analysis services), which also includes operating system processes (workload management) and scalability aspects (in the event of additional capacity requirements, technical scaling up is needed).

A business intelligence facility should provide the organisation with (delivery) reliable data, combined with highly demand-oriented – and therefore manoeuvrable – development of information products. This is precisely the combination which should be designed and which makes development within a business intelligence production so unique: a delicate puzzle of infrastructure arrangement, architecture standards, process modelling and organisation arrangement. No small feat indeed and may be considered complex – as the organisation’s scale increases – and which will require some stamina, but which will provide solid ground for the success of the business intelligence basic facility.

References:

  1. Operational business intelligence as defined by Claudia Imhoff and Colin White can not be served by data warehouse architectures. Techniques that are characterized as (complex) event processing are needed. The data warehouse can however deliver support to this event processing.
  2. Also known as back end.
  3. Also known as front end.
  4. The second article of this series of three  addressed this in more detail, for instance by using appliances for unpredictable occupation of means by specific target groups (e.g., data mining).
  5. Occupation of means is required to be measurable and therefore to be part of the production instruments array of business intelligence facilities.


SOURCE: Development Processes in Data Warehouse Environments

  • Ronald DamhofRonald Damhof
    Ronald Damhof is an information management practitioner with more than 15 years of international experience in the field.

    His areas of focus include:

    1. Data management, including data quality, data governance and data warehousing;
    2. Enterprise architectural principles;
    3. Exploiting data to its maximum potential for decision support.
    Ronald is an Information Quality Certified Professional (International Association for Information and Data Quality one of the first 20 to pass this prestigious exam), Certified Data Vault Grandmaster (only person in the world to have this level of certification), and a Certified Scrum Master. He is a strong advocate of agile and lean principles and practices (e.g., Scrum). You can reach him at +31 6 269 671 84, through his website at http://www.prudenza.nl/ or via email at ronald.damhof@prudenza.nl.
 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!