Group Anagrams — Canonical Signature Hashing
Use a canonical key (sorted string or frequency vector) to bucket anagrams together in linearithmic time overall.
Problem Statement
Return the groups of strings that are anagrams of each other.
Why This Problem Matters
- Demonstrates transforming complex equivalence into a simple hash key.
- Highlights trade-offs between sorting-based signatures and fixed alphabet counts.
- Prepares you for frequency map patterns used throughout Blind 75.
Thought Process
Choose a canonical representation
Sorting characters or counting frequency ensures all anagrams share the same key.
Bucket by signature
Use a hash map from signature → list of words to collect anagrams together.
Return collected values
After processing all strings, convert the map values into the list of groups.
Step-by-Step Reasoning
- Initialise a map signature → List<String>.
- For each word, create a 26-length frequency key (or sorted char array).
- Append the word to map.computeIfAbsent(signature).
- Return the values of the map.
Dry Run
Input ["eat","tea","tan","ate","nat","bat"]
Signature for eat/tea/ate matches; tan/nat share a signature; bat stands alone.
Signature building
Frequency array converted to string such as "1#0#0#..." keeps keys collision-free.
Complexity Analysis
Time
O(n·L)
Space
O(n·L)
Why
Each word of length L requires building a signature; storage holds all words in grouped lists.
Annotated Solution
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class GroupAnagrams {
public List<List<String>> groupAnagrams(String[] strs) {
Map<String, List<String>> buckets = new HashMap<>();
for (String word : strs) {
int[] counts = new int[26];
for (char c : word.toCharArray()) {
counts[c - 'a'] += 1;
}
StringBuilder key = new StringBuilder();
for (int count : counts) {
key.append('#').append(count);
}
buckets.computeIfAbsent(key.toString(), ignore -> new ArrayList<>()).add(word);
}
return new ArrayList<>(buckets.values());
}
}Frequency signatures keep runtime linear in total characters and avoid sorting overhead while guaranteeing identical keys for anagrams.
Common Pitfalls
- Using simple concatenation without separators can collide (e.g., [11] vs [1,1]).
- Sorting per word increases runtime to O(n·L log L); clarify alphabet assumptions before choosing.
- Storing char arrays as keys without converting to immutable strings results in incorrect hashing.
Follow-Up Questions
- How would you handle Unicode characters? (Switch to Map<Character, Integer> signatures.)
- Can you perform grouping in streaming fashion? (Process incrementally with the same map.)