Enterprise data warehouse or star schema? (aka Inmon vs Kimball)


In my business intelligence consulting experience the question often arises whether it is better to build an enterprise data warehouse or to design a star schema database with fact tables sharing conformed dimensions. In an enterprise data warehouse or corporate information factory the organization is modeled in a central data model and most respected methodologists recommend a third normal form data model. The star schema approach is optimized for fast queries and advocates building fact tables joined to dimensions with dimensions being shared among fact tables. The dimensions use unique integer primary keys which are referenced by the fact tables as foreign keys. The star schemas are often called data marts connoting that a mart is smaller than a warehouse. The early thought leaders for these concepts are Bill Inmon for the enterprise data warehouse and corporate information factory and Ralph Kimball for the dimensional star schema architecture.

 

Now, in my opinion, a methodology is not a search for Truth; it should be evaluated on its effectiveness in allowing an organization to achieve its objectives, given its constraints and its current context. (That’s a long way of saying “it depends”.) If time and money were unlimited, the hybrid approach of building an enterprise model database and projecting to various business process oriented data marts and cubes would almost always be the most effective design. If time and money are very constrained, then loading data directly into a star schema with a companion cube would an efficient design to deliver analysis capability for a specific subject area.

 

The enterprise data model allows flexibility in modeling an organization’s data in a way that matches the organization, rather than trying to map the organization’s processes into a fact/dimension mold. The operational systems primarily model the organization at point in time — now. The enterprise data warehouse can be updated at difference frequencies from separate operational data sources and would model the organization over time and allow retrieving information about a point in time or a range of time. From this central data store, star schema databases can be loaded at regular frequencies. These data marts are modeled with specific functional purposes in mind and optimized for query analysis speed. Since these data marts can be created directly, without the central data store, the incremental cost of creating the enterprise data warehouse must be evaluated. There is an up front development cost and ongoing operational costs. The initial cost includes additional design, ETL development and disk space. The ongoing costs include the additional processing time and storage growth that the redundant data store requires. The return on investment must be shown by faster data mart building and loading, better quality information, and the ability to deliver future solutions that would have been difficult to achieve if a star schema design in the past proved to be inaccurate. The chief benefit is flexibility in architecture.

 

A good strategy to achieve a hybrid method is to build the business intelligence system one business process at a time. Design the enterprise model subset and dimensional model together and structure the ETL processes to pass the data through efficiently. With each subsequent project, refine the enterprise model and reuse conformed dimensions in the data mart when adding new fact and dimension tables. The enterprise data warehouse grows incrementally with immediate benefits of data mart reporting.

 

The star schema dimensional model with multidimensional cubes is the most efficient structure in current technology, both in terms of processing speed and end user report writing speed, although new in-memory column-oriented technology is beginning to look attractive. An enterprise data warehouse model can be the most accurate method to describe an organization over time. Combining the two now will allow very powerful analysis today with the option of using a more effective delivery technology in the future. An individual organization must decide how to deploy its resources now and prioritize the value of a data warehouse versus a decision support analysis tool. I recommend always using star schemas and multidimensional cubes today and combine with an enterprise data model when the information strategy justifies more cost for a better foundation for the future.

About these ads

4 thoughts on “Enterprise data warehouse or star schema? (aka Inmon vs Kimball)

  1. Hi Will,
    Thanks for your post.
    Regarding your comment:
    “The star schema dimensional model with multidimensional cubes is the most efficient structure in current technology, both in terms of processing speed and end user report writing speed, although new in-memory column-oriented technology is beginning to look attractive.”

    Are you acknowledging that improved column-oriented technology may render the Kimball design moot?

    If for example, a traditional relational data model is the enterprise data warehouse, and it could be made to perform extremely well in this “normalized” state, would this not be the ideal state, rather than going through all the work to create the Kimbalized view of it?

    The example technology that comes to mind is Sybase IQ.

    Please share your thoughts on this.

    Thanks,
    -Larry

    Like

    • Hi Larry,

      Thanks for your question. In addition to the query effieciencies of the star schema model, it also affords an intuitive palette from which an end user can select fields to create reports and analyses. The concepts of dimensions with hierarchies and measures that are filtered and aggregated using those structures are understood quickly. It is very effective for a reductionist analysis methodology where you break down a high level general aggregate of interesting data into smaller categories until you discover a more specific cause or opportunity.

      The in-memory purely column-oriented technology lacks the predefined structure of dimensions and hierarchies. That’s not to say that the dashboard creation process or data modeling couldn’t include these. But if they do, I think it proves that the Kimball model is not moot.

      Thanks,

      Will

      Like

  2. I agree. Both have their place. A star schema work great for building cubes. An Enterprise data warehouse is great to collect data that may be used for a variety of reasons. A hybrid approach is a great choice if you can afford it.

    Like

  3. Hi Will,

    Interesting article and one which brings out the true character of data warehousing. In my experience the user requirements have always required a bit of both. The hybrid approach has always been implemented in some sort of manner.

    A fantastic reflection indeed is the following paragraph.

    A good strategy to achieve a hybrid method is to build the business intelligence system one business process at a time. Design the enterprise model subset and dimensional model together and structure the ETL processes to pass the data through efficiently. With each subsequent project, refine the enterprise model and reuse conformed dimensions in the data mart when adding new fact and dimension tables. The enterprise data warehouse grows incrementally with immediate benefits of data mart reporting.

    Good comment also by Larry.

    Many thanks
    Ahmed

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s