Designing Instagram Stories
Create an ephemeral content system that handles massive write spikes, global distribution, and automatic expiration.
Problem Statement
Design Instagram Stories: ephemeral photo/video posts that disappear after 24 hours. Users can post multiple stories, view friends' stories in a tray, and see viewer lists. The system must handle hundreds of millions of daily story creators, billions of views, and efficient content expiration.
Why This Problem Matters
- Stories represent a different access pattern from feed: write-heavy, time-bounded, and heavily cached.
- Tests understanding of ephemeral data management, media processing pipelines, and CDN strategies.
- Demonstrates ability to optimize for a specific use case rather than building a general-purpose system.
Thought Process
Understand the unique characteristics
Stories are ephemeral (24-hour TTL), append-only during their lifetime, and have a clear lifecycle. Unlike feed posts, they're not ranked—just shown in recency order. The "tray" (horizontal story list) needs fast loading.
Design the upload pipeline
User uploads media → Client compresses and sends to Upload Service → Media stored in blob storage → Transcoding service creates multiple resolutions → CDN-ready URLs stored in Story metadata → Notify followers.
Optimize the story tray
The tray shows profile pictures of users with active stories. Pre-compute and cache the tray for each user. When someone you follow posts a story, invalidate your tray cache. Use fan-out on write for the tray (not the full story content).
Plan content expiration
Each story has a created_at timestamp. Stories older than 24 hours are expired. Options: (1) Lazy deletion—check TTL on read, (2) Background job that scans and deletes expired content, (3) Use database TTL features if available.
Handle viewer tracking
When someone views your story, record the view. Store viewer lists per story (not per story item). Use approximate counting for view counts (HyperLogLog) and paginated lists for viewer details.
Step-by-Step Reasoning
- Upload: Client captures photo/video → Compresses locally → Uploads to Media Service → Stored in S3/blob storage → Transcoding job creates 480p, 720p, 1080p versions.
- Publish: Story metadata (user_id, media_url, created_at, expires_at) saved to Stories DB → Fan-out: add to followers' story trays (lightweight: just user_id + has_story flag).
- View Tray: Fetch user's cached story tray → For each user with active stories, fetch thumbnail → Display ordered by recency of most recent story.
- View Story: Fetch story metadata for selected user → Stream media from CDN → Record view in Viewers table (async write).
- Expiration: Background worker runs every minute → Queries stories where expires_at < now → Deletes from DB and blob storage → Invalidates CDN cache.
Dry Run
Alice posts a story (Alice has 1000 followers)
Media uploaded and transcoded (~2s) → Story metadata saved → Fan-out service adds Alice to 1000 followers' story trays (batched writes, ~100ms).
Bob opens Instagram, sees story tray
Fetch Bob's cached tray (50 users with active stories) → Load profile thumbnails from CDN → Display in 200ms.
Alice's story expires after 24 hours
Background job finds expired story → Deletes metadata from DB → Sends CDN purge request → Removes Alice from followers' trays (lazy: on next tray fetch).
Complexity Analysis
Time
Upload: O(transcoding_time) ~2-5s. Tray fetch: O(1) with cache, O(following_count) on miss. View story: O(story_count) for that user.
Space
O(stories × resolutions) for media storage. O(users × tray_size) for tray caches. O(story × viewers) for viewer tracking.
Why
Ephemeral nature bounds storage: ~2B stories created daily but only store 24 hours worth. CDN handles read amplification.
Annotated Solution
// Story Service pseudocode
public class StoryService {
private MediaService mediaService;
private StoryStore storyStore;
private TrayCache trayCache;
private FollowGraph followGraph;
public Story createStory(String userId, byte[] mediaData) {
// 1. Upload and process media
MediaResult media = mediaService.uploadAndTranscode(mediaData);
// 2. Create story record
Story story = new Story();
story.setUserId(userId);
story.setMediaUrl(media.getCdnUrl());
story.setCreatedAt(Instant.now());
story.setExpiresAt(Instant.now().plus(Duration.ofHours(24)));
storyStore.save(story);
// 3. Update followers' trays
List<String> followers = followGraph.getFollowers(userId);
trayCache.addToTrays(followers, userId);
return story;
}
public List<TrayItem> getStoryTray(String userId) {
// Get cached tray
List<String> usersWithStories = trayCache.get(userId);
// Filter expired (lazy cleanup)
List<TrayItem> items = new ArrayList<>();
for (String storyUserId : usersWithStories) {
Story latestStory = storyStore.getLatest(storyUserId);
if (latestStory != null && !latestStory.isExpired()) {
items.add(new TrayItem(storyUserId, latestStory.getCreatedAt()));
}
}
// Sort by recency
items.sort(Comparator.comparing(TrayItem::getLatestStoryTime).reversed());
return items;
}
}Stories are write-heavy with predictable expiration. The tray cache enables fast loading of the story list, while lazy expiration cleanup handles TTL.
Common Pitfalls
- Write spikes: New Year's Eve can see 10x normal upload rate. Use queue-based ingestion with auto-scaling workers.
- Tray staleness: If tray cache is too stale, users miss new stories. Balance cache TTL with freshness requirements.
- Viewer list explosion: Popular accounts can have millions of viewers. Use sampling or approximate counts for display.
- Time zone confusion: "24 hours" should be UTC-based, not local time, to avoid inconsistency.
Follow-Up Questions
- How would you implement story highlights (permanent saved stories)? (Copy to different storage, no TTL, separate access pattern)
- How do you handle story replies/reactions? (Ephemeral messaging, linked to story_id, expire with story)
- How would you implement close friends lists? (ACL per story, filter at fan-out time)