Blind 75Medium

Group Anagrams — Canonical Signature Hashing

Use a canonical key (sorted string or frequency vector) to bucket anagrams together in linearithmic time overall.

Hash MapString

View problem on LeetCode

Problem Statement

Return the groups of strings that are anagrams of each other.

Why This Problem Matters

Demonstrates transforming complex equivalence into a simple hash key.
Highlights trade-offs between sorting-based signatures and fixed alphabet counts.
Prepares you for frequency map patterns used throughout Blind 75.

Thought Process

Choose a canonical representation

Sorting characters or counting frequency ensures all anagrams share the same key.

Bucket by signature

Use a hash map from signature → list of words to collect anagrams together.

Return collected values

After processing all strings, convert the map values into the list of groups.

Step-by-Step Reasoning

Initialise a map signature → List<String>.
For each word, create a 26-length frequency key (or sorted char array).
Append the word to map.computeIfAbsent(signature).
Return the values of the map.

Dry Run

Input ["eat","tea","tan","ate","nat","bat"]

Signature for eat/tea/ate matches; tan/nat share a signature; bat stands alone.

Signature building

Frequency array converted to string such as "1#0#0#..." keeps keys collision-free.

Complexity Analysis

Time

O(n·L)

Space

O(n·L)

Why

Each word of length L requires building a signature; storage holds all words in grouped lists.

Annotated Solution

JAVA

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class GroupAnagrams {
  public List<List<String>> groupAnagrams(String[] strs) {
    Map<String, List<String>> buckets = new HashMap<>();

    for (String word : strs) {
      int[] counts = new int[26];
      for (char c : word.toCharArray()) {
        counts[c - 'a'] += 1;
      }

      StringBuilder key = new StringBuilder();
      for (int count : counts) {
        key.append('#').append(count);
      }

      buckets.computeIfAbsent(key.toString(), ignore -> new ArrayList<>()).add(word);
    }

    return new ArrayList<>(buckets.values());
  }
}

Frequency signatures keep runtime linear in total characters and avoid sorting overhead while guaranteeing identical keys for anagrams.

Common Pitfalls

Using simple concatenation without separators can collide (e.g., [11] vs [1,1]).
Sorting per word increases runtime to O(n·L log L); clarify alphabet assumptions before choosing.
Storing char arrays as keys without converting to immutable strings results in incorrect hashing.

Follow-Up Questions

How would you handle Unicode characters? (Switch to Map<Character, Integer> signatures.)
Can you perform grouping in streaming fashion? (Process incrementally with the same map.)

Key Takeaways

Designing the right hash key often simplifies grouping problems.

Discuss time/space trade-offs between sorted signatures and frequency vectors explicitly.

Immutable keys prevent hashing issues when stored in maps.