SKEDSOFT

Design Analysis Of Algorithm

The Knuth-Morris-Pratt matching algorithm is given in pseudocode below as the procedure KMP-MATCHER. It is mostly modeled after FINITE-AUTOMATON-MATCHER, as we shall see. KMP-MATCHER calls the auxiliary procedure COMPUTE-PREFIX-FUNCTION to compute π.

		KMP-MATCHER(T, P)
 1 n  length[T]
 2 m  length[P]
 3 π  COMPUTE-PREFIX-FUNCTION(P)
 4 q  0                          Number of characters matched.
 5 for i  1 to n                 Scan the text from left to right.
 6      do while q > 0 and P[q   1]  T[i]
 7             do q  π[q]    Next character does not match.
 8         if P[q   1] = T[i]
 9            then q  q   1      Next character matches.
10         if q = m                    Is all of P matched?
11            then print "Pattern occurs with shift" i - m
12                 q  π[q]    Look for the next match.
		COMPUTE-PREFIX-FUNCTION(P)
 1 m  length[P]
 2 π[1]  0
 3 k  0
 4 for q  2 to m
 5      do while k > 0 and P[k   1]  P[q]
 6             do k  π[k]
 7         if P[k   1] = P[q]
 8            then k  k   1
 9         π[q]  k
10 return π

We begin with an analysis of the running times of these procedures. Proving these procedures correct will be more complicated.

Running-time analysis

The running time of COMPUTE-PREFIX-FUNCTION is Θ(m), using the potential method of amortized analysis. We associate a potential of k with the current state k of the algorithm. This potential has an initial value of 0, by line 3. Line 6 decreases k whenever it is executed, since π[k] < k. Since π[k] 0 for all k, however, k can never become negative. The only other line that affects k is line 8, which increases k by at most one during each execution of the for loop body. Since k < q upon entering the for loop, and since q is incremented in each iteration of the for loop body, k < q always holds. (This justifies the claim that π[q] < q as well, by line 9.) We can pay for each execution of the while loop body on line 6 with the corresponding decrease in the potential function, since π[k] < k. Line 8 increases the potential function by at most one, so that the amortized cost of the loop body on lines 5-9 is O(1). Since the number of outer-loop iterations is Θ(m), and since the final potential function is at least as great as the initial potential function, the total actual worst-case running time of COMPUTE-PREFIX-FUNCTION is Θ(m).

A similar amortized analysis, using the value of q as the potential function, shows that the matching time of KMP-MATCHER is Θ(n).

Compared to FINITE-AUTOMATON-MATCHER, by using π rather than δ, we have reduced the time for preprocessing the pattern from O(m |Σ|) to Θ(m), while keeping the actual matching time bounded by Θ(n).

Correctness of the prefix-function computation

We start with an essential lemma showing that by iterating the prefix function π, we can enumerate all the prefixes Pk that are proper suffixes of a given prefix Pq. Let

π*[q] = {π[q], π(2)[q], π(3)[q], . . . , π(t)[q]},

where π(i)[q] is defined in terms of functional iteration, so that π(0)[q] = q and π(i 1)[q] = π[π(i)[q]] for i 1, and where it is understood that the sequence in π*[q] stops when π(t)[q] = 0 is reached. The following lemma characterizes π*[q], as Figure 32.11 illustrates.