There are three reasons to use an OLAP cube in addition to your relational database – performance, drill-down functionality and availability of software tools.
Online analytical processing (OLAP) is a technique for quickly analyzing a measure, e.g. profit margin, by multiple categories or dimensions, e.g. customer, region, fiscal period and product line. Typically the end user software has capabilities to drag categories to rows and columns and aggregate the measure at each intersection of a row and column (often called a cross tab report). This is similar to the familiar spreadsheet format. This numeric format can usually also be represented in the form of a chart or graph. The real power of OLAP is the ability to drill down on a category to see more details. For example, you might drill down on a state to see details by city.
An OLAP cube is a technology that stores data in an optimized way to provide quick response to queries by dimension and measure. Most cubes pre-aggregate the measures by the different levels of categories in the dimensions to enable the quick response time. End user software will make querying a cube very easy, but developers, who may be accustomed to using SQL language, will need to learn a new language – MDX (Multi-Dimensional eXpressions).
The standard design for a relational database source for this analysis is called a star schema. A fact table is related to multiple dimensions and this can be represented graphically in a form of a star. A star schema design will support reporting and analysis by dimensions for measures in the cross tab and graphical formats without using an OLAP cube. Why would you go to the time, expense, disk space, skill development and increased maintenance to also build a cube when a relational database will support this analysis?
There are three reasons for adding a cube to your solution:
- Performance. A cube’s structure and pre-aggregation allows it to provide very fast responses to queries that would have required reading, grouping and summarizing millions of rows of relational star-schema data. The drilling and slicing and dicing that an analyst would want to perform to explore the data would be immediate using a cube but could have lengthy pauses when using a relational data source.
- Drill down functionality. Many reporting software tools will automatically allow drilling up and down on dimensions with the data source is an OLAP cube. Some tools, like IBM Cognos’ Dimensionally Modeled Relational model will allow you to use their product on a relational source and drill down as if it were OLAP but you would not have the performance gains you would enjoy from a cube.
- Availability of software tools. Some client software reporting tools will only use an OLAP data source for reporting. These tools are designed for multi-dimensional analysis and use MDX behind the scenes to query the data.
OLAP cube technology will cost more in terms of development, learning and project time but will return benefits in fast response time to analyze large amounts of data. This capability can result in insights that drive actions and decisions that enable very large organizational productivity, cost saving or revenue increasing gains.
Hi Will,
Thank you for your helpful article. Are OLAP cubes worth considering where there are no calculated values but many ways we wish to report on the data?
We are a large training provider and need to interactively report on a large number of exam results. Currently we use Microsoft Reporting Services with an offline denormalised results table (1/2 million rows) with many indexes to support the different ways we want to report by (by client, teaching centre, department, date, manager etc).
We intend to add another 3 million rows in the future.
LikeLike
Nicholas, In your case I would try to load it into QlikView: it will keep the un-aggregated data in memory (very performant) and will not precalculate anything, giving you the flexibility to analyze by any dimension. You can download QlikView Personal Edition from their website.
LikeLike
Hi Nicholas,
While the purpose of OLAP cubes is to analyze aggregated data by different categories, it is possible to use them to categorize events, states and other data that’s not numeric by nature. Typically the reporting involves counts by different slices and hierarchies with the ability to drill through to the details from a particular count. Cubes are not really that convenient for list reports that have a number of attributes as columns and a large number of rows; Reporting Services is more appropriate for the detailed drill through reports.
Using QlikView is a good suggestion if you want to click on filters to limit the data shown on a dashboard type report. QlikView doesn’t use predefined hierarchies with drill down paths; it lets you select filters in any combination and shows the results very quickly. There are a number of fun demos on their website. Their menu system to select a demo is actually a QlikView application. They store data in memory in a proprietary column-based associative format. A word is only stored once even if it occurs in a million rows so there is a lot of compression.
LikeLike
From a performance standpoint, is there a limit to the max number of detail rows on a drill-thru that should be set?
For example, in the scenario below, is there a reasonable expectation on results response time back to the user?
filter hierarchy: org/state/customer/status/amt owed
database records at customer level: 2 million rows
LikeLike
I see Qlikview mentioned several times above and have used the tool myself. I do agree that it is quite good for a basic parameter style reporting interface but personally i would recommend looking at a tool such as Web Intelligence from Business Objects. We use this tool internally in conjunction with Data Federator to pull data into memory and it is extremely fast. The advantage here over Qlikview is that Web Intelligence provides a semantic layer where i can drag and drop to create my own queries and drill through as i please in comparison to Qlikview where i just choose from a predefined set of parameters to change whats displayed on the report. Good luck with your projects!
LikeLike
Hi Will,
Thanx for this useful article ,,, i have some questions (if u don’t mind 🙂 ) ,,
1 – Are cubes used to address specific areas ?? such as using cube for sales ? another one for customers ?? and so on ?!!!
2 – I am not sure if i understand the dimensions,, are they targeting specific aspect ? like time dimension that can be used with other dimensions to form cube ??
Thanx in advance
LikeLike
Hi Majid,
Cubes address a subject area for a business. It is structured such that it allows you to analyze facts and measures, like sales transactions or financial transactions in the context of dimensions, like Customer, Product, or Account. The dimensions usually acts a qualifiers that you use to classify or filter the data. For example, in “I want to see sales in the Eastern Region by Customer Name.”, sales would be the fact, region could be a dimension and customer could be a dimension. The business need is to filter on region and report sales by customer name.
Dimensions can be shared among cubes. For example, the Product dimension may be used in both a Sales cube and an Inventory cube.
I hope this helps.
Thanks,
Will
LikeLike
Good discussion.
LikeLike
Thanks. It’s realy interesting
LikeLike
Thanks for sharing!
LikeLike