Tag Archives: Enterprise data warehouse

Enterprise data warehouse or star schema? (aka Inmon vs Kimball)


In my business intelligence consulting experience the question often arises whether it is better to build an enterprise data warehouse or to design a star schema database with fact tables sharing conformed dimensions. In an enterprise data warehouse or corporate information factory the organization is modeled in a central data model and most respected methodologists recommend a third normal form data model. The star schema approach is optimized for fast queries and advocates building fact tables joined to dimensions with dimensions being shared among fact tables. The dimensions use unique integer primary keys which are referenced by the fact tables as foreign keys. The star schemas are often called data marts connoting that a mart is smaller than a warehouse. The early thought leaders for these concepts are Bill Inmon for the enterprise data warehouse and corporate information factory and Ralph Kimball for the dimensional star schema architecture.

 

Now, in my opinion, a methodology is not a search for Truth; it should be evaluated on its effectiveness in allowing an organization to achieve its objectives, given its constraints and its current context. (That’s a long way of saying “it depends”.) If time and money were unlimited, the hybrid approach of building an enterprise model database and projecting to various business process oriented data marts and cubes would almost always be the most effective design. If time and money are very constrained, then loading data directly into a star schema with a companion cube would an efficient design to deliver analysis capability for a specific subject area.

 

The enterprise data model allows flexibility in modeling an organization’s data in a way that matches the organization, rather than trying to map the organization’s processes into a fact/dimension mold. The operational systems primarily model the organization at point in time — now. The enterprise data warehouse can be updated at difference frequencies from separate operational data sources and would model the organization over time and allow retrieving information about a point in time or a range of time. From this central data store, star schema databases can be loaded at regular frequencies. These data marts are modeled with specific functional purposes in mind and optimized for query analysis speed. Since these data marts can be created directly, without the central data store, the incremental cost of creating the enterprise data warehouse must be evaluated. There is an up front development cost and ongoing operational costs. The initial cost includes additional design, ETL development and disk space. The ongoing costs include the additional processing time and storage growth that the redundant data store requires. The return on investment must be shown by faster data mart building and loading, better quality information, and the ability to deliver future solutions that would have been difficult to achieve if a star schema design in the past proved to be inaccurate. The chief benefit is flexibility in architecture.

 

A good strategy to achieve a hybrid method is to build the business intelligence system one business process at a time. Design the enterprise model subset and dimensional model together and structure the ETL processes to pass the data through efficiently. With each subsequent project, refine the enterprise model and reuse conformed dimensions in the data mart when adding new fact and dimension tables. The enterprise data warehouse grows incrementally with immediate benefits of data mart reporting.

 

The star schema dimensional model with multidimensional cubes is the most efficient structure in current technology, both in terms of processing speed and end user report writing speed, although new in-memory column-oriented technology is beginning to look attractive. An enterprise data warehouse model can be the most accurate method to describe an organization over time. Combining the two now will allow very powerful analysis today with the option of using a more effective delivery technology in the future. An individual organization must decide how to deploy its resources now and prioritize the value of a data warehouse versus a decision support analysis tool. I recommend always using star schemas and multidimensional cubes today and combine with an enterprise data model when the information strategy justifies more cost for a better foundation for the future.

Advertisements