SKEDSOFT

Data Mining & Data Warehousing

Introduction: A discrete ordinal variable resembles a categorical variable, except that the M states of the ordinal value are ordered in a meaningful sequence. Ordinal variables are very useful for registering subjective assessments of qualities that cannot be measured objectively. For example, professional ranks are often enumerated in a sequential order, such as assistant, associate, and full for professors.

A continuous ordinal variable looks like a set of continuous data of an unknown scale; that is, the relative ordering of the values is essential but their actual magnitude is not. For example, the relative ranking in a particular sport (e.g., gold, silver, bronze) is often more essential than the actual values of a particular measure. Ordinal variables may also be obtained from the discretization of interval-scaled quantities by splitting the value range into a finite number of classes. The values of an ordinal variable can be mapped to ranks. For example, suppose that an ordinal variable f has Mf states. These ordered states define the ranking 1, ……., Mf .

The treatment of ordinal variables is quite similar to that of interval-scaled variables when computing the dissimilarity between objects. Suppose that f is a variable from a set of ordinal variables describing n objects. The dissimilarity computation with respect to f involves the following steps:

1.       The value of f for the ith object is xif, and f has Mf ordered states, representing the ranking 1, . . . . , Mf . Replace each xi f by its corresponding rank, rif ∈ {1, . . . , Mf }.

2.       Since each ordinal variable can have a different number of states, it is often necessary to map the range of each variable onto [0.0,1.0] so that each variable has equal weight. This can be achieved by replacing the rank rif of the ith object in the f th variable by

zif = (rif – 1) / (Mf – 1)

3.       Dissimilarity can then be computed using any of the distance measures described in Section 7.2.1 for interval-scaled variables, using zi f to represent the f value for the ith object.

Example: Dissimilarity between ordinal variables. Suppose that we have the sample data of Table 7.3, except that this time only the object-identifier and the continuous ordinal variable, test-2, are available. There are three states for test-2, namely fair, good, and excellent, that is Mf =3. For step 1, if we replace each value for test-2 by its rank, the four objects are assigned the ranks 3, 1, 2, and 3, respectively. Step 2 normalizes the ranking by mapping rank 1 to 0.0, rank 2 to 0.5, and rank 3 to 1.0. For step 3, we can use, say, the Euclidean distance (Equation (7.5)), which results in the following dissimilarity matrix: