The Knuth-Morris-Pratt matching algorithm is given in pseudocode below as the procedure KMP-MATCHER. It is mostly modeled after FINITE-AUTOMATON-MATCHER, as we shall see. KMP-MATCHER calls the auxiliary procedure COMPUTE-PREFIX-FUNCTION to compute π.
KMP-MATCHER(T, P) 1 n ← length[T] 2 m ← length[P] 3 π ← COMPUTE-PREFIX-FUNCTION(P) 4 q ← 0 ▹Number of characters matched. 5 for i ← 1 to n ▹Scan the text from left to right. 6 do while q > 0 and P[q 1] ≠ T[i] 7 do q ← π[q] ▹Next character does not match. 8 if P[q 1] = T[i] 9 then q ← q 1 ▹Next character matches. 10 if q = m ▹Is all of P matched? 11 then print "Pattern occurs with shift" i - m 12 q ← π[q] ▹Look for the next match.
COMPUTE-PREFIX-FUNCTION(P) 1 m ← length[P] 2 π[1] ← 0 3 k ← 0 4 for q ← 2 to m 5 do while k > 0 and P[k 1] ≠ P[q] 6 do k ← π[k] 7 if P[k 1] = P[q] 8 then k ← k 1 9 π[q] ← k 10 return π
We begin with an analysis of the running times of these procedures. Proving these procedures correct will be more complicated.
The running time of COMPUTE-PREFIX-FUNCTION is Θ(m), using the potential method of amortized analysis. We associate a potential of k with the current state k of the algorithm. This potential has an initial value of 0, by line 3. Line 6 decreases k whenever it is executed, since π[k] < k. Since π[k] ≥ 0 for all k, however, k can never become negative. The only other line that affects k is line 8, which increases k by at most one during each execution of the for loop body. Since k < q upon entering the for loop, and since q is incremented in each iteration of the for loop body, k < q always holds. (This justifies the claim that π[q] < q as well, by line 9.) We can pay for each execution of the while loop body on line 6 with the corresponding decrease in the potential function, since π[k] < k. Line 8 increases the potential function by at most one, so that the amortized cost of the loop body on lines 5-9 is O(1). Since the number of outer-loop iterations is Θ(m), and since the final potential function is at least as great as the initial potential function, the total actual worst-case running time of COMPUTE-PREFIX-FUNCTION is Θ(m).
A similar amortized analysis, using the value of q as the potential function, shows that the matching time of KMP-MATCHER is Θ(n).
Compared to FINITE-AUTOMATON-MATCHER, by using π rather than δ, we have reduced the time for preprocessing the pattern from O(m |Σ|) to Θ(m), while keeping the actual matching time bounded by Θ(n).
We start with an essential lemma showing that by iterating the prefix function π, we can enumerate all the prefixes Pk that are proper suffixes of a given prefix Pq. Let
π*[q] = {π[q], π(2)[q], π(3)[q], . . . , π(t)[q]},
where π(i)[q] is defined in terms of functional iteration, so that π(0)[q] = q and π(i 1)[q] = π[π(i)[q]] for i ≥ 1, and where it is understood that the sequence in π*[q] stops when π(t)[q] = 0 is reached. The following lemma characterizes π*[q], as Figure 32.11 illustrates.