SKEDSOFT

Data Mining & Data Warehousing

 

Introduction: A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management decision making process.”

The Design of a Data Warehouse: To design an effective data warehouse we need to understand and analyze business needs and construct a business analysis framework. The construction of a large and complex information system can be viewed as the construction of a large and complex building, for which the owner, architect, and builder have different views. These views are combined to form a complex framework that represents the top-down, business-driven, or owner’s perspective, as well as the bottom-up, builder-driven, or implementer’s view of the information system.

Four different views regarding the design of a data warehouse must be considered: the top-down view, the data source view, the data warehouse view, and the business query view.

The top-down view allows the selection of the relevant information necessary for the data warehouse. This information matches the current and future business needs.

The data source view exposes the information being captured, stored, and managed by operational systems. This information may be documented at various levels of detail and accuracy, from individual data source tables to integrated data source tables. Data sources are often modeled by traditional data modeling techniques, such as the entity-relationship model or CASE (computer-aided software engineering) tools.

The data warehouse view includes fact tables and dimension tables. It represents the information that is stored inside the data warehouse, including pre-calculated totals and counts, as well as information regarding the source, date, and time of origin, added to provide historical context.

 Finally, the business query view is the perspective of data in the data warehouse from the viewpoint of the end user.

Building and using a data warehouse is a complex task because it requires business skills, technology skills, and program management skills. Regarding business skills, buildinga data warehouse involves understanding how such systems store and manages their data,how to build extractors that transfer data from the operational system to the data warehouse,and how to build warehouse refresh software that keeps the data warehouse reasonablyup-to-date with the operational system’s data. Using a data warehouse involvesunderstanding the significance of the data it contains as well as understanding and translatingthe business requirements into queries that can be satisfied by the data warehouse.

Regarding technology skills, data analysts are required to understand how to make assessments from quantitative information and derive facts based on conclusions from historical information in the data warehouse. These skills include the ability to discover patterns and trends, to extrapolate trends based on history and look for anomalies or paradigm shifts, and to present coherent managerial recommendations based on such analysis. Finally, program management skills involve the need to interface with many technologies, vendors, and end users in order to deliver results in a timely and cost-effective manner.