SKEDSOFT

Data Mining & Data Warehousing

Introduction: As a general data structure, graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. Mining frequent sub graph patterns for further characterization, discrimination, classification, and cluster analysis becomes an important task.

Graphs become increasingly important in modeling complicated structures, such as circuits, images, chemical compounds, protein structures, and biological networks, social networks, the Web, workflows, and XML documents. Many graph search algorithms have been developed in chemical informatics, computer vision, video indexing, and text retrieval. With the increasing demand on the analysis of large amounts of structured data, graph mining has become an active and important theme in data mining.

Among the various kinds of graph patterns, frequent substructures are the very basic patterns that can be discovered in a collection of graphs. They are useful for characterizing graph sets, discriminating different groups of graphs, classifying and clustering graphs, building graph indices, and facilitating similarity search in graph databases. Recent studies have developed several graph mining methods and applied them to the discovery of interesting patterns in various applications. For example, there have been reports on the discovery of active chemical structures in HIV-screening datasets by contrasting the support of frequent graphs between different classes. There have been studies on the use of frequent structures as features to classify chemical compounds, on the frequent graph mining technique to study protein structural families, on the detection of considerably large frequent sub pathways in metabolic networks, and on the use of frequent graph patterns for graph indexing and similarity search in graph databases.

Although graph mining may include mining frequent sub graph patterns, graph classification, clustering, and other analysis tasks, in this section we focus on mining frequent sub graphs. We look at various methods, their extensions, and applications.

Moreover, graphs that link many nodes together may form different kinds of networks, such as telecommunication networks, computer networks, biological networks, and Web and social community networks. Because such networks have been studied extensively in the context of social networks, their analysis has often been referred to as social network analysis. Furthermore, in a relational database, objects are semantically linked across multiple relations. Mining in a relational database often requires mining across multiple interconnected relations, which is similar to mining in connected graphs or networks. Such kind of mining across data relations is considered multi relational data mining.

  • A large wealth of data in today’s networks can be represented as graphs:
  • Physical connections
  • ASs commercial agreement
  • Peering connections in overlay networks
  • Relationships in social networks
  • Hyperlinks in Web pages
  • Complex interactions among entities

Often the result of the decentralized un-coordinated activity of network players.

These graphs contain valuable information for many network applications:

  • Community detection
  • Classification
  • Recommendation systems
  • Web search
  • P2P search and retrieval
  • Trust and reputation