Errors in the OLAP storage engine: A duplicate attribute key has been found when processing
Duplicate attribute key error
A frequent error when processing an SSAS cube begins “Errors in the OLAP storage engine: A duplicate attribute key has been found when processing”. I have found that this is usually one of two conditions:
Typical Causes
- An actual duplicate key is found where a value for a child in a hierarchy occurs with two different parent values.
- A value for a data item is NULL.
Solution for actual duplicate key
The former is easier to discover and can be resolved by using a compound key for the parent that includes the child key. Look for it when the field in question is part of a hierarchy. A child can only have one parent.
Solution for null data
The latter can be confounding because the error message is often misleading. It can show a value in the error message, not indicating that the real underlying cause is the NULL value. This can be repaired by using a COALESCE statement to replace a NULL with a value such as ‘Unknown’ or simply ”.
A good practice is to use views as the source of data to a cube, rather than making any modifications within the SSAS data source view. It is in the view that I add the COALESCE function around the data item.
For example, if there are nulls in the MIDDLE_NAME data item, the view
CREATE VIEW vwPeople AS
SELECT FIRST_NAME [First Name],
COALESCE(MIDDLE_NAME,”) [Middle Name],
LAST_NAME [Last Name]
FROM PEOPLE
will use the MIDDLE_NAME value for [Middle Name] unless it is NULL and then will use the empty string instead.
When to use an OLAP cube?
There are three reasons to use an OLAP cube in addition to your relational database – performance, drill-down functionality and availability of software tools.
Online analytical processing (OLAP) is a technique for quickly analyzing a measure, e.g. profit margin, by multiple categories or dimensions, e.g. customer, region, fiscal period and product line. Typically the end user software has capabilities to drag categories to rows and columns and aggregate the measure at each intersection of a row and column (often called a cross tab report). This is similar to the familiar spreadsheet format. This numeric format can usually also be represented in the form of a chart or graph. The real power of OLAP is the ability to drill down on a category to see more details. For example, you might drill down on a state to see details by city.
An OLAP cube is a technology that stores data in an optimized way to provide quick response to queries by dimension and measure. Most cubes pre-aggregate the measures by the different levels of categories in the dimensions to enable the quick response time. End user software will make querying a cube very easy, but developers, who may be accustomed to using SQL language, will need to learn a new language – MDX (Multi-Dimensional eXpressions).
The standard design for a relational database source for this analysis is called a star schema. A fact table is related to multiple dimensions and this can be represented graphically in a form of a star. A star schema design will support reporting and analysis by dimensions for measures in the cross tab and graphical formats without using an OLAP cube. Why would you go to the time, expense, disk space, skill development and increased maintenance to also build a cube when a relational database will support this analysis?
There are three reasons for adding a cube to your solution:
- Performance. A cube’s structure and pre-aggregation allows it to provide very fast responses to queries that would have required reading, grouping and summarizing millions of rows of relational star-schema data. The drilling and slicing and dicing that an analyst would want to perform to explore the data would be immediate using a cube but could have lengthy pauses when using a relational data source.
- Drill down functionality. Many reporting software tools will automatically allow drilling up and down on dimensions with the data source is an OLAP cube. Some tools, like IBM Cognos’ Dimensionally Modeled Relational model will allow you to use their product on a relational source and drill down as if it were OLAP but you would not have the performance gains you would enjoy from a cube.
- Availability of software tools. Some client software reporting tools will only use an OLAP data source for reporting. These tools are designed for multi-dimensional analysis and use MDX behind the scenes to query the data.
OLAP cube technology will cost more in terms of development, learning and project time but will return benefits in fast response time to analyze large amounts of data. This capability can result in insights that drive actions and decisions that enable very large organizational productivity, cost saving or revenue increasing gains.
Which OLAP cube should I use, Microsoft SSAS or IBM Cognos PowerCube (Powerplay)?
An Online Analytical Processing (OLAP) cube is a powerful tool to analyze large amounts of data quickly. There are a number of products available that can create an OLAP cube. This article will discuss some of the factors that should be considered in choosing an OLAP technology and evaluate two major vendors’ products – Microsoft SQL Server Analysis Services 2008 (SSAS) and IBM Cognos PowerCube.
Three factors that are important in considering OLAP technologies are:
1. Your existing database and reporting technologies.
2. The amount of data that will be brought into the cube.
3. The skill set of workers that will design and create cubes.
Microsoft SQL Server
If your data warehouse platform is based on Microsoft SQL Server, especially if you are also using SQL Server Integration Services (SSIS), then SSAS is a natural extension of the architecture. SSIS can used to load and process the SSAS cube. SQL Server data will be optimized for loading into the SSAS cube. (It is important to note that SSAS can also source from Oracle, DB2, Access and Teradata data sources.) This Microsoft database environment can also be used as source data for an IBM Cognos cube, but the Cognos cube cannot be used with Microsoft reporting tools such as Reporting Services. IBM Cognos does have a software product that allows Powercube browsing with Excel; Microsoft SSAS cubes can be browsed natively with Excel pivot tables.
SharePoint
If your portal and collaboration solution is Microsoft SharePoint there, are exciting features in SharePoint 2010 that combines PerformancePoint and new PowerPivot software. These will leverage SSAS cubes and will not support IBM Cognos Powercubes.
IBM Cognos environment
If your database platform is not Microsoft and your reporting environment is purely IBM Cognos Business Intelligence, then using Powercubes is a natural extension. Analysis Services cubes may also be used in this environment if other considerations make this desirable, but all things being equal, Cognos is a better fit.
Size of data
The size of the source data and resulting cube is of paramount importance in your choice. Cognos Powercubes have an inherent limit of 2 Gb, although there are workaround techniques. Microsoft SSAS cubes are commonly 300-400 Gb in size with source data measured in terabytes. Multi-terabyte SSAS cubes are in use today. SSAS also gives the ability to use relational tables for aggregation (known as ROLAP) or a hybrid (known as HOLAP). This allows for even more scalability. For large amounts of data, Microsoft is a clear winner.
Learning Curve
Microsoft SSAS requires a more technical skill set for developers than IBM Cognos Powercubes. Microsoft cubes will require a working knowledge of the multi-dimensional language, MDX. For developers that know SQL it will look similar, but is a paradigm shift analogous to moving from procedural languages to object oriented languages. There will be a learning curve. The return for that investment is much more flexibility, programmability and extensibility.
IBM Cognos Transformer, the software used to design a Powercube, was designed with the developer or power user in mind. IBM envisions strong financial analysts creating their own cubes. The result is simpler and easier to use but lacks the rich capabilities present in SSAS. Organizations with a medium amount of data and limited technical resources can build solutions quicker using IBM Cognos cubes.
Conclusion
In conclusion, Microsoft SQL Server Analysis Services will provide more scalability, extensibility and functionality with a cost of complexity and learning curve. This makes it a more strategic choice and IBM Cognos Powercubes a tactical choice in the appropriate situations. Powercubes are best used with smaller sets of data in an existing Cognos environment and when a less advanced technical skill set is required by the designers.
-
Archives
- April 2012 (1)
- September 2011 (1)
- April 2011 (1)
- June 2010 (1)
- May 2010 (1)
- April 2010 (1)
- March 2010 (1)
- January 2010 (1)
- December 2009 (1)
- May 2009 (3)
- April 2009 (2)
- March 2009 (5)
-
Categories
-
RSS
Entries RSS
Comments RSS
A Market Street Solutions business intelligence advisory consultant with deep experience in Microsoft and Cognos data warehousing architecture, data modeling, data loading and reporting. Microsoft certified – MCITP, MCT (Microsoft Certified Trainer) - in Business Intelligence and Database Development, Wharton MBA, chess player, dog lover, and husband. I live in Chattanooga, Tennessee, with my wife Jill, son Henry, dogs Bubba and Caesar, and cats Priscilla, Elvis, Prudence and Precious.