Data Warehouse Architecture:
Enhance business productivity
Customer relationship management
To design a data warehouse, understand and analyze business needs and construct a business analysis framework.
The different views regarding the design are
Top-down view-selection of relevant information
Data source view-the information being captured, stored and managed by operational systems.
Data warehouse view-includes fact,dimension tables
Business query view-end user view
For building and using data warehouse needs certain skills
Business skill-how much system store, manage, hoe to build warehouse refresh software.
Technology skill-how to make assessments from quantitative information and to derive facts.
Program management skill-to interface with technologies.
A data warehouse can be built using a top-down approach, bottom-up approach and combined approach.The design and construction may consists of the following steps:
Data integration and testing
Deployment of data warehouse
Large software systems can be developed using
The design process has the following steps
Choose a business process to model eg.,orders,sales.
Choose the grain of the business process.
Choose the dimension.
Choose the measure.
A Three-Tier Data Warehouse Architecture:
Bottom Tier-relational database is formed from operational database, external resources.
Middle Tier-using relational OLAP or multidimensional OLAP.
Top Tier-querying and reporting tools,analysis,data mining tools.
There are three data warehouse models,
Enterprise warehouse-It collects all of the information about subjects spanning the entire organisation.
Data Mart-It contains a subset of corporate wide data that is of value to a specific group of users.There are two types
Independent Data Mart-data captured from operational systems/external sources/from a particular area or department.
Dependent departments-directly from enterprise data warehouse
Virtual warehouse-It is a set views over operational databases.
Types of OLAP Services:
Relational OLAP servers(ROLAP)-intermediate server that is in between a relational back end server and client front end tools.
Multidimensional OLAP Servers(MOLAP)-It support multi-dimensional views of data array-based multidimensional storage engines.The vies are mapped to data cube array structures.
Hybrid OLAP servers-combining both i.e., the grater scalability of ROLAP and faster computation of MOLAP.
Data Warehouse Implementation:
Efficient Computation of data cubes:
The compute cube operator is used for cube computation i.e., it computes aggregates over all subsets of the dimension specified in the operation.
Eg.:Compute the sum of sales grouping by item and city.
Total number of cuboid is 2n i.e., 2^2=4.If no group-by is used then it is zero dimensional operation or O-D cuboid or apex cuboid's.If there are n group be then it is n-dimensional operation.
The cube operator is the n-dimensional generalization of the group by operator.Compute cube sales. Generates all cuboid's.
If the dimension has hierarchy, then the total number of cuboid.If there are more cuboid's which may lead to storage problem, then partial materialization is the good option.
The partial materialization should consider three factors:
Identify materialization subsets of cuboid's to materialistic.
Exploit the materialization cuboid's during the query processing.
Efficiently update the materialized cuboid's during load and refresh.
In OLAP, multi way array aggregation is used for computation of data cubes,
ROLAP uses value value based addressing using tuples and relational tables.MOLAP uses direct addressing using array index.
So, differently approach is used for array-based cube construction of MOLAP.
1.Partition the array into chunks.
2.Computer aggregate by visiting cube cells.
MOLAP is significantly faster than ROLAP>
Indexing OLAP data:
For low-cardinality domains, bitmap indexing is used.A bit vector BV, exist for value V in the domain of the attribute.
For high-cardinality domains, join indexing is used. It registers join able rows of two relations from a relational database.
OLAP Query Processing:
It proceeds as following:
Determine which operation should be performed on the available cuboid's.
Determine to which materialized cuboid's the relevant operations to be applied.
Metadata are data about data i.e., it defines warehouse objects.It contains the following:
Structure description of the data warehouse.
Operational metadata-data image,currency of data,monitoring information.
Algorithms used for summarization.
Mapping from operational environment to data warehouse.
Data related to system performance.
some of the data warehouse functions are
More articles: House