Pattern Searching in Strings — Sliding Windows & Prefixes
Master core string search techniques: naive sliding window, Rabin-Karp hashing, and prefix-function-based KMP.
Problem Statement
Given a text and a pattern, find all occurrences efficiently by choosing the right strategy—simple windowing for short patterns, hashing for rolling comparisons, or prefix functions for guaranteed linear time.
Why This Problem Matters
- Pattern matching problems surface in DNA analysis, log search, and editor features.
- Explaining multiple strategies signals depth: brute force baseline, then improved methods.
- Hashing and prefix tables test your ability to reason about false positives and preprocessing trade-offs.
Thought Process
Start with the brute-force baseline
Slide the pattern over the text and compare characters. This frames the O(n * m) cost you will improve upon.
Introduce rolling hash (Rabin-Karp)
Hash the pattern and current window. Matching hashes trigger character verification, reducing average comparisons.
Reach for KMP when worst-case guarantees matter
Precompute the prefix function (LPS array) so the search never rewinds in the text, yielding O(n + m).
Step-by-Step Reasoning
- Compute prefix table (LPS) for the pattern.
- Traverse the text, advancing both indices when characters match.
- When mismatch occurs, jump the pattern index using LPS instead of resetting to zero.
- Record matches when the pattern index reaches its length.
Dry Run
Text "ababcabcabababd", Pattern "ababd"
KMP uses LPS to avoid rechecking characters. Match found at index 10.
Pattern with repeated prefix "aaaab"
LPS array builds to [0,1,2,3,0], allowing quick fallback to index 3 on mismatch.
Complexity Analysis
Time
O(n + m)
Space
O(m)
Why
KMP preprocesses the pattern in O(m) and scans the text in O(n) without backtracking.
Annotated Solution
import java.util.ArrayList;
import java.util.List;
public class PatternSearch {
private static int[] buildLps(String pattern) {
int[] lps = new int[pattern.length()];
int length = 0;
int i = 1;
while (i < pattern.length()) {
if (pattern.charAt(i) == pattern.charAt(length)) {
length += 1;
lps[i] = length;
i += 1;
} else if (length != 0) {
length = lps[length - 1];
} else {
lps[i] = 0;
i += 1;
}
}
return lps;
}
public static List<Integer> kmpSearch(String text, String pattern) {
List<Integer> matches = new ArrayList<>();
if (pattern.isEmpty()) {
return matches;
}
int[] lps = buildLps(pattern);
int i = 0;
int j = 0;
while (i < text.length()) {
if (text.charAt(i) == pattern.charAt(j)) {
i += 1;
j += 1;
if (j == pattern.length()) {
matches.add(i - j);
j = lps[j - 1];
}
} else if (j != 0) {
j = lps[j - 1];
} else {
i += 1;
}
}
return matches;
}
}KMP’s prefix table enables the search to reuse information about matched prefixes, eliminating redundant comparisons.
Common Pitfalls
- Forgetting to verify characters after a hash match in Rabin-Karp leads to false positives.
- Building the LPS table incorrectly causes infinite loops or missed matches.
- Neglecting edge cases like empty pattern or pattern longer than text.
Follow-Up Questions
- Extend to multiple pattern matching using Aho-Corasick automata.
- Discuss rolling hash modulus choice to minimise collisions.
- Explain how to adapt KMP for streaming input where the text arrives incrementally.