SKEDSOFT

Data Mining & Data Warehousing

Introduction: A spatial data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of both spatial and non-spatial data in support of spatial data mining and spatial-data related decision-making processes.

Example: Spatial data cube and spatial OLAP. There are about 3,000 weather probes distributed in British Columbia (BC), Canada, each recording daily temperature and precipitation for a designated small area and transmitting signals to a provincial weather station. With a spatial data warehouse that supports spatial OLAP, a user can view weather patterns on a map by month, by region, and by different combinations of temperature and precipitation, and can dynamically drill down or roll up along any dimension to explore desired patterns, such as “wet and hot regions in the Fraser Valley in summer 1999.”

There are several challenging issues regarding the construction and utilization of spatial data warehouses. The first challenge is the integration of spatial data from heterogeneous sources and systems. Spatial data are usually stored in different industry firms and government agencies using various data formats. Data formats are not only structure-specific (e.g., raster- vs. vector-based spatial data, object-oriented vs. relational models, different spatial storage and indexing structures), but also vendor-specific (e.g., ESRI, MapInfo, Intergraph). There has been a great deal of work on the integration and exchange of heterogeneous spatial data, which has paved the way for spatial data integration and spatial data warehouse construction.

The second challenge is the realization of fast and flexible on-line analytical processing in spatial data warehouses. choice for modeling spatial data warehouses because it provides a concise and organized warehouse structure and facilitates OLAP operations. However, in a spatial warehouse, both dimensions and measures may contain spatial components.

There are three types of dimensions in a spatial data cube:

A non-spatial dimension contains only non-spatial data. Non-spatial dimensions temperature and precipitation can be constructed for the warehouse in Example 10.5, since each contains non-spatial data whose generalizations are non-spatial (such as “hot” for temperature and “wet” for precipitation).

A spatial-to-non-spatial dimension is a dimension whose primitive-level data are spatial but whose generalization, starting at a certain high level, becomes non-spatial. For example, the spatial dimension city relays geographic data for the U.S. map. Suppose that the dimension’s spatial representation of, say, Seattle is generalized to the string “pacific northwest.” Although “pacific northwest” is a spatial concept, its representation is not spatial (since, in our example, it is a string). It therefore plays the role of a non-spatial dimension.

A spatial-to-spatial dimension is a dimension whose primitive level and all of its high level generalized data are spatial. For example, the dimension equi temperature region contains spatial data, as do all of its generalizations, such as with regions covering 0-5 degrees (Celsius), 5-10 degrees, and so on.

We distinguish two types of measures in a spatial data cube:

A numerical measure contains only numerical data. For example, one measure in a spatial data warehouse could be the monthly revenue of a region, so that a roll-up may compute the total revenue by year, by county, and so on. Numerical measures can be further classified into distributive, algebraic, and holistic, as discussed in Chapter 3.

A spatial measure contains a collection of pointers to spatial objects. For example, in a generalization (or roll-up) in the spatial data cube of Example 10.5, the regions with the same range of temperature and precipitation will be grouped into the same cell, and the measure so formed contains a collection of pointers to those regions.

A non-spatial data cube contains only non-spatial dimensions and numerical measures. If a spatial data cube contains spatial dimensions but no spatial measures, its OLAP operations, such as drilling or pivoting, can be implemented in a manner similar to that for non-spatial data cubes.