LearnIntermediate

Pattern Searching in Strings — Sliding Windows & Prefixes

Master core string search techniques: naive sliding window, Rabin-Karp hashing, and prefix-function-based KMP.

StringsPattern Searching

Problem Statement

Given a text and a pattern, find all occurrences efficiently by choosing the right strategy—simple windowing for short patterns, hashing for rolling comparisons, or prefix functions for guaranteed linear time.

Why This Problem Matters

  • Pattern matching problems surface in DNA analysis, log search, and editor features.
  • Explaining multiple strategies signals depth: brute force baseline, then improved methods.
  • Hashing and prefix tables test your ability to reason about false positives and preprocessing trade-offs.

Thought Process

Start with the brute-force baseline

Slide the pattern over the text and compare characters. This frames the O(n * m) cost you will improve upon.

Introduce rolling hash (Rabin-Karp)

Hash the pattern and current window. Matching hashes trigger character verification, reducing average comparisons.

Reach for KMP when worst-case guarantees matter

Precompute the prefix function (LPS array) so the search never rewinds in the text, yielding O(n + m).

Step-by-Step Reasoning

  1. Compute prefix table (LPS) for the pattern.
  2. Traverse the text, advancing both indices when characters match.
  3. When mismatch occurs, jump the pattern index using LPS instead of resetting to zero.
  4. Record matches when the pattern index reaches its length.

Dry Run

Text "ababcabcabababd", Pattern "ababd"

KMP uses LPS to avoid rechecking characters. Match found at index 10.

Pattern with repeated prefix "aaaab"

LPS array builds to [0,1,2,3,0], allowing quick fallback to index 3 on mismatch.

Complexity Analysis

Time

O(n + m)

Space

O(m)

Why

KMP preprocesses the pattern in O(m) and scans the text in O(n) without backtracking.

Annotated Solution

JAVA
import java.util.ArrayList;
import java.util.List;

public class PatternSearch {
  private static int[] buildLps(String pattern) {
    int[] lps = new int[pattern.length()];
    int length = 0;
    int i = 1;

    while (i < pattern.length()) {
      if (pattern.charAt(i) == pattern.charAt(length)) {
        length += 1;
        lps[i] = length;
        i += 1;
      } else if (length != 0) {
        length = lps[length - 1];
      } else {
        lps[i] = 0;
        i += 1;
      }
    }

    return lps;
  }

  public static List<Integer> kmpSearch(String text, String pattern) {
    List<Integer> matches = new ArrayList<>();
    if (pattern.isEmpty()) {
      return matches;
    }

    int[] lps = buildLps(pattern);
    int i = 0;
    int j = 0;

    while (i < text.length()) {
      if (text.charAt(i) == pattern.charAt(j)) {
        i += 1;
        j += 1;
        if (j == pattern.length()) {
          matches.add(i - j);
          j = lps[j - 1];
        }
      } else if (j != 0) {
        j = lps[j - 1];
      } else {
        i += 1;
      }
    }

    return matches;
  }
}

KMP’s prefix table enables the search to reuse information about matched prefixes, eliminating redundant comparisons.

Common Pitfalls

  • Forgetting to verify characters after a hash match in Rabin-Karp leads to false positives.
  • Building the LPS table incorrectly causes infinite loops or missed matches.
  • Neglecting edge cases like empty pattern or pattern longer than text.

Follow-Up Questions

  • Extend to multiple pattern matching using Aho-Corasick automata.
  • Discuss rolling hash modulus choice to minimise collisions.
  • Explain how to adapt KMP for streaming input where the text arrives incrementally.

Key Takeaways

Always articulate the brute-force baseline before pivoting to faster algorithms.
Rabin-Karp excels when multiple patterns share the same length; KMP shines for single-pattern worst-case guarantees.
Prefix tables/LPS arrays are reusable structures—mention them when discussing automaton-based searches.