SKEDSOFT

Data Mining & Data Warehousing

Introduction: Relational database systems have been widely used in business applications. With the progress of database technology, various kinds of advanced data and information systems have emerged and are undergoing development to address the requirements of new applications.

The new database applications include handling spatial data (such as maps), engineering design data (such as the design of buildings, system components, or integrated circuits), hypertext and multimedia data (including text, image, video, and audio data), time-related data (such as historical records or stock exchange data), stream data (such as video surveillance and sensor data, where data flow in and out like streams), and theWorld-Wide-Web (a huge, widely distributed information repository made available by the Internet). These applications require efficient data structures and scalable methods for handling complex object structures; variable-length records; semi structured or unstructured data; text, spatiotemporal, and multimedia data; and database schemas with complex structures and dynamic changes.

Inresponsetotheseneeds,advanceddatabasesystems and specific application-oriented database systems have been developed. These include object-relational database systems, temporal and time-series database systems, spatial and spatiotemporal database systems, text and multimedia database systems, heterogeneous and legacy database systems, data stream management systems, andWeb-based global information systems.

While such databases or information repositories require sophisticated facilities to efficiently store, retrieve, and update large amounts of complex data, they also provide fertile grounds and raise many challenging research and implementation issues for data mining. In this section, we describe each of the advanced database systems listed above.

Object-Relational Databases

Object-relational databases are constructed based on an object-relational data model. This model extends the relational model by providing a rich data type for handling complex objects and object orientation. Because most sophisticated database applications need to handle complex objects and structures, object-relational databases are becoming increasingly popular in industry and applications.

Conceptually, the object-relational data model inherits the essential concepts of object-oriented databases, where, in general terms, each entity is considered as an object. Following the AllElectronics example, objects can be individual employees, customers, or items. Data and code relating to an object are encapsulated into a single unit. Each object has associated with it the following:

  • A set of variables that describe the objects. These correspond to attributes in the entity-relationship and relational models.
  • A set of messages that the object can use to communicate with other objects, or with the rest of the database system.
  • A set of methods, where each method holds the code to implement a message. Upon receiving a message, the method returns a value in response. For instance, the method for the message get photo(employee) will retrieve and return a photo of the given employee objects.

Objects that share a common set of properties can be grouped into an object class. Each object is an instance of its class. Object classes can be organized into class/subclasshierarchies so that each class represents properties that are common to objects in that class. For instance, an employee class can contain variables like name, address, and birthdate. Suppose that the class, sales person, is a subclass of the class, employee. A sales person object would inherit all of the variables pertaining to its superclass of employee. In addition, it has all of the variables that pertain specifically to being a salesperson (e.g., commission). Such a class inheritance feature benefits information sharing.

For data mining in object-relational systems, techniques need to be developed for handling complex object structures, complex data types, class and subclass hierarchies, property inheritance, and methods and procedures.