Algorithms | Knuth–Morris–Pratt algorithm

In this article we talk abount KMP algorithm

Let’s define a statement:

Find all occurrence of pattern in string.

By using naive approach we get answer of O(m * n) complexity, where n - length of text and m - length of pattern. However applying KMP algorithm, this could be done of O(m + n) asymptotic complexity and O(m) space complexity.

Algorithm:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import java.util.Arrays;

class Kmp {
public static void main(String[] args) {
String text = "ABDAABAABCCDVCABCADEEQAAA";
String pattern = "ABCA";
search(text, pattern);

}
static void search(String text, String pattern) {
System.out.format("Search [%s] in [%s]%n", pattern, text);
int [] lps = computeTable(pattern);
int textIndex = 0, textLen = text.length();
int patternIndex = 0, patternLen = pattern.length();
while(textIndex < textLen) {
// match case
if (text.charAt(textIndex) == pattern.charAt(patternIndex)) {
++textIndex;
++patternIndex;
}
if (patternIndex == patternLen) {
// find pattern in text
System.out.format("Find match at index %s%n",textIndex - patternLen);
patternIndex = lps[patternIndex - 1];
} else if (textIndex < textLen && pattern.charAt(patternIndex) != text.charAt(textIndex)) {
// mismatch case
if (patternIndex != 0) {
// go to prev precomputed value
patternIndex = lps[patternIndex - 1];
} else {
++textIndex;
}
}
}
}
static int[] computeTable(String pattern) {
int m = pattern.length();
int [] lps = new int [m];
int len = 0;
int index = 1;
while(index < m) {
if (pattern.charAt(index) == pattern.charAt(len)) {
lps[index++] = ++len;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
++index;
}
}
}
return lps;

}
}

Main idea is next:

  1. Pre compute longest proper prefix which is also suffix(lps)
  2. Define textIndex, patternIndex as iteration values in cycle
  3. Iterate in text and do comparison between corresponding indexes of text and pattern:
    1. If match case, increment both indexes of text and pattern
    2. In case when index of pattern equals to lenth of pattern, it means we finded a match
    3. In another case decrement corresponding index of pattern until we find a match or it will become a zero. In zero case increment text index