Engineering Lyriclingo with Redis Caching and Deduplication

When building LyricLingo, my goal was to create a tool that would help language learners use popular music as a study aid. The application connects to Spotify, fetches the current song, and generates flashcards showing the original lyrics and their translation. What seemed simple at first became a complex engineering challenge, requiring optimization to deliver a seamless user experience while managing API costs and server resources.

This post dives into two key optimizations that dramatically improved performance: Redis caching and lyric deduplication. These techniques reduced API costs by 40%, decreased response times by 200ms, and significantly improved the application's scalability.

The Challenge: Scaling Lyrics Translation

LyricLingo faced several technical challenges during development:

API Costs: Translation APIs charge per character, making full song translations expensive at scale.
Latency: External API calls to Genius (for lyrics) and translation services added significant latency.
Repetition: Songs often contain repeated lines (choruses, refrains), resulting in redundant translation requests.
User Experience: Users expect near-instant flashcard generation when logging a song.

Solution 1: Redis Caching Implementation

The first optimization I implemented was Redis caching, which preserved translation results, flashcards, and sentiment analysis for future requests. Here's how I set it up:

server.js - Redis Connection Setup

const express = require("express");
const Redis = require("ioredis");
// Other imports...

// Use the REDIS_URL environment variable, or fallback to localhost if not set
const redisURL = process.env.REDIS_URL || "redis://127.0.0.1:6379";
const redis = new Redis(redisURL);

// Log Redis connection errors
redis.on("error", (err) => {
  console.error("Redis error:", err);
});

// Make Redis client available globally for use in controllers
if (redis) {
  global.redisClient = redis;
}

I designed a structured caching strategy with user-specific cache keys, ensuring data privacy and efficient retrieval:

songController.js - Cache Key Generation

/**
 * Generate cache keys for a user's song data
 * @param {string} userId - User ID
 * @param {string} songTitle - Song title
 * @param {string} artist - Artist name (optional)
 * @param {string} language - Language code (optional)
 * @returns {Object} Object containing different cache keys
 */
const generateCacheKeys = (userId, songTitle, artist = null, language = null) => {
  return {
    flashcards: `flashcards:${userId}:${songTitle}${language ? ':' + language : ''}`,
    sentiment: `sentiment:${userId}:${songTitle}${artist ? ':' + artist : ''}`,
    translations: `translations:${userId}:${songTitle}${language ? ':' + language : ''}`
  };
};

The caching system needed to be intelligent about invalidation. When a user deleted a song or changed language preferences, related caches had to be cleared:

songController.js - Cache Invalidation

/**
 * Clear all caches related to a specific song
 * @param {string} userId - User ID
 * @param {string} songTitle - Song title 
 * @param {string} artist - Artist name (optional)
 */
const clearSongCaches = async (userId, songTitle, artist = null) => {
  if (!redis) return;
  
  try {
    const cacheKeys = generateCacheKeys(userId, songTitle, artist);
    
    // Build a list of language-specific keys to clear
    const languageKeys = [];
    const supportedLanguages = ['ES', 'FR', 'PT', 'IT', 'DE', 'JA', 'ZH', 'RU', 'KO'];
    
    // Add language-specific cache keys
    for (const lang of supportedLanguages) {
      languageKeys.push(`flashcards:${userId}:${songTitle}:${lang}`);
      languageKeys.push(`translations:${userId}:${songTitle}:${lang}`);
    }
    
    // Combine all keys to delete
    const keysToDelete = [
      cacheKeys.flashcards,
      cacheKeys.sentiment,
      cacheKeys.translations,
      ...languageKeys
    ];
    
    // Execute multi-delete
    if (keysToDelete.length > 0) {
      await redis.del(...keysToDelete);
      console.log(`🗑️ Cleared all caches for song "${songTitle}" (user: ${userId})`);
    }
  } catch (error) {
    console.error(`❌ Error clearing caches for song "${songTitle}":`, error);
  }
};

When fetching flashcards, the system first checks the cache, saving significant processing time and API costs for repeat requests:

songController.js - Cache-First Fetching

// Get flashcards for a specific song
const getFlashcardsForSong = async (req, res) => {
  try {
    const songTitle = req.query.song;
    const forceLanguage = req.query.lang; // Optional override parameter
    const userId = req.userId;
    const cacheKeys = generateCacheKeys(userId, songTitle, null, forceLanguage);

    // Check Redis cache first
    const cachedFlashcards = await redis.get(cacheKeys.flashcards);
    if (cachedFlashcards) {
      console.log(`⚡ Serving flashcards from cache for: ${songTitle}`);
      return res.json(JSON.parse(cachedFlashcards));
    }

    // If not in cache, proceed with generation...
    // [Lyrics and translation processing logic]
    
    // Cache the flashcards (without expiration)
    await redis.set(cacheKeys.flashcards, JSON.stringify(flashcards));
    console.log(`💾 Cached flashcards for: ${songTitle}`);

    res.json(flashcards);
  } catch (error) {
    console.error("❌ Error fetching flashcards:", error);
    res.status(500).json({ error: "Failed to generate flashcards." });
  }
};

Solution 2: Lyric Deduplication Algorithm

The second optimization addressed a common characteristic of song lyrics: repetition. Choruses, refrains, and repeated lines could account for 30-60% of a song's content. Translating each instance separately was inefficient and costly.

I designed a deduplication algorithm that identifies unique lines while preserving the complete song structure for flashcard generation:

songController.js - Lyric Deduplication

// Step 1: Create a map of unique lines for translation efficiency
const uniqueLines = new Map(); // Maps text to index in uniqueArray
const uniqueArray = []; // Stores unique lyric lines
const originalToUnique = new Map(); // Maps original position to unique line index

// Identify unique lines while preserving all original lines
frontLines.forEach((line, index) => {
  const trimmedLine = line.trim();
  
  if (!uniqueLines.has(trimmedLine)) {
    // New unique line found
    uniqueLines.set(trimmedLine, uniqueArray.length);
    uniqueArray.push(trimmedLine);
  }
  
  // Map this position to its corresponding unique line
  originalToUnique.set(index, uniqueLines.get(trimmedLine));
});

// Calculate and log translation optimization stats
const totalLines = frontLines.length;
const uniqueCount = uniqueArray.length;
const savedLines = totalLines - uniqueCount;
const percentSaved = Math.round((savedLines / totalLines) * 100);

console.log(`🔍 Translation Optimization: ${totalLines} total lines → ${uniqueCount} unique lines to translate (saved ${percentSaved}% API usage)`);

After identifying unique lines, I implemented batch processing to further optimize API requests:

songController.js - Batch Translation

// Step 3: Translate only the unique lines in batches
const BATCH_SIZE = 10; // Number of lines to translate in each API call

// Process unique lines in batches
for (let i = 0; i < uniqueArray.length; i += BATCH_SIZE) {
  const batch = uniqueArray.slice(i, i + BATCH_SIZE);
  console.log(`🔤 Translating unique batch ${Math.floor(i/BATCH_SIZE) + 1} (${batch.length} lines)`);
  
  // Send the batch for translation with the detected/forced language
  const translatedBatch = await translateBatch(batch, detectedOrForcedLanguage);
  
  // Add translated lines to unique translations array
  uniqueTranslations.push(...translatedBatch);
}

// Store translations in Redis (without expiration)
await redis.set(translationsCacheKey, JSON.stringify(uniqueTranslations));
console.log(`💾 Cached translations for: ${songTitle}`);

Finally, the algorithm maps translations back to the original song structure, creating flashcards that maintain the full song's flow:

songController.js - Restoring Original Structure

// Step 4: Map translations back to ALL original lines (including duplicates)
const backLines = frontLines.map((_, index) => {
  // Get the unique line index for this position
  const uniqueIndex = originalToUnique.get(index);
  // Return the translation for this unique line
  return uniqueTranslations[uniqueIndex] || "Translation unavailable";
});

// Create flashcards with proper alignment, keeping ALL lines in order
let flashcards = frontLines.map((line, index) => {
  return {
    front: line.trim(),
    // Extra cleaning to remove any potential artifact characters from translation
    back: backLines[index].trim().replace(/\|+/g, '').trim()
  };
});

Results and Impact

These optimizations delivered significant performance and cost improvements:

API Cost Reduction: Lyric deduplication reduced translation API calls by 40% on average, with some songs seeing 60%+ reduction.
Response Time: Cache hits reduced response times from ~800ms to ~200ms, a 75% improvement.
User Experience: Returning users experience near-instant flashcard generation for previously viewed songs.
Scalability: The Redis implementation allows for horizontal scaling as user base grows.

Other Implementation Considerations

Beyond the core optimizations, several additional refinements were necessary:

Section Marker Removal

Song lyrics often contain section markers (like [Verse], [Chorus]) that don't contribute to language learning. I implemented a cleanup routine in the lyrics service:

lyricsService.js - Section Marker Removal

const scrapeLyrics = async (lyricsUrl) => {
  try {
    // Fetch and initial parsing...
    
    // Process the content line by line
    const lines = html.split('\n');
    let processedLines = [];
    
    for (let line of lines) {
      // Skip empty lines
      if (!line.trim()) continue;
      
      // Remove HTML tags while preserving text content
      let textLine = line.replace(/<[^>]+>/g, '');
      textLine = textLine.trim();
      
      // Skip section markers - quoted or unquoted
      // This handles patterns like: "[Verse]", ""[Verse]"", "[Verse: Artist]"
      if (/^"?\[.*?\]"?$/.test(textLine)) continue;
      
      // Remove any section markers within a line
      // This handles mixed content like "Some lyrics [Bridge] more lyrics"
      textLine = textLine.replace(/"?\[.*?\]"?/g, '').trim();
      
      // Skip if line became empty after removing section markers
      if (textLine.length === 0) continue;
      
      processedLines.push(textLine);
    }
    
    // Join processed lines with newlines
    lyrics += processedLines.join('\n') + '\n';
    
    // Further cleanup...
    
    return lyrics;
  } catch (error) {
    console.error("❌ Error scraping lyrics:", error);
    return null;
  }
};

Language Detection

To optimize translation, the system automatically detects the source language:

songController.js - Language Detection

// Step 2: Automatically detect language from the combined lyrics
// Use forceLanguage if explicitly provided in the request
let detectedOrForcedLanguage = forceLanguage;
if (!detectedOrForcedLanguage) {
  // Create a sample text for language detection (first 10 unique lines or fewer)
  const sampleText = uniqueArray.slice(0, Math.min(10, uniqueArray.length)).join(" ");
  detectedOrForcedLanguage = languageDetector.detectLanguage(sampleText);
  console.log(`🔤 Auto-detected language: ${detectedOrForcedLanguage}`);
} else {
  console.log(`🔤 Using forced language: ${detectedOrForcedLanguage}`);
}

Quality Filtering

Not all generated flashcards are useful for learning. I implemented quality filters to ensure learners see only the most helpful content:

songController.js - Flashcard Quality Filtering

// Track filtering statistics for debugging
const initialCount = flashcards.length;
let emptyCount = 0;
let identicalCount = 0;
let sectionMarkerCount = 0;

// Track identical translations but keep them in the results
flashcards.forEach(card => {
  if (card.front === card.back) {
    identicalCount++;
    // Optionally add a marker to the card
    card.isIdentical = true;
  }
});

// Final filter to ensure quality flashcards with detailed tracking
flashcards = flashcards.filter(card => {
  // Check for empty front or back
  if (card.front.length === 0 || card.back.length === 0) {
    emptyCount++;
    return false;
  }
  
  // Check for section markers that might have slipped through
  if (/^\[.*\]$/.test(card.front) || /^\[.*\]$/.test(card.back)) {
    sectionMarkerCount++;
    return false;
  }
  
  return true;
});

Frontend Integration

The frontend needed to intelligently interact with the optimized backend. The Flashcards component was designed to respect cache lifecycles and provide visual feedback during loading:

Flashcards.jsx - Language Selection

// Effect to fetch flashcards when song or language changes
useEffect(() => {
  if (selectedSong && isAuthenticated()) {
    fetchFlashcards();
  }
}, [selectedSong, selectedLanguage]);

// Enhanced fetchFlashcards with user-specific handling and improved error handling
const fetchFlashcards = async () => {
  if (!selectedSong) return;
  
  // Validate user authentication
  if (!isAuthenticated()) {
    setToast({
      show: true,
      message: "Please log in to access flashcards",
      type: "warning"
    });
    navigate("/login");
    return;
  }
  
  setIsLoadingCards(true);
  
  try {
    // Construct URL with language parameter
    let url = `${backendUrl}/api/songs/flashcards?song=${encodeURIComponent(selectedSong.song)}`;
    if (selectedLanguage !== "auto") {
      url += `&lang=${selectedLanguage}`;
    }
    
    // Add user ID for server-side validation
    const userId = getUserId();
    if (userId) {
      url += `&userId=${userId}`;
    }
    
    const data = await apiGet(url);
    
    if (data.error) {
      throw new Error(data.error);
    }
    
    setFlashcards(data);
    
    // Update detected language
    if (data.length > 0 && data[0].detectedLanguage) {
      setDetectedLanguage(data[0].detectedLanguage);
    }
  } catch (error) {
    console.error("Error fetching flashcards:", error);
    
    // Only show toast for actual errors, not for "song not found" 
    if (!error.message.includes("Song not found")) {
      setToast({
        show: true,
        message: error.message || "Failed to load flashcards",
        type: "error"
      });
    }
  } finally {
    setIsLoadingCards(false);
  }
};

Conclusion

By implementing Redis caching and lyric deduplication, LyricLingo evolved from a simple concept into a robust, efficient language learning tool. These optimizations not only improved performance but also made the service more economically viable by dramatically reducing API costs.

The technical choices highlight important principles for any web application:

Identify Patterns: Understanding the repetitive nature of song lyrics revealed optimization opportunities.
Minimize External Calls: The deduplication strategy significantly reduced translation API requests.
Persistent Caching: Redis provided a scalable solution for preserving expensive computations.
User-Specific Optimization: Namespacing caches by user ID maintained data privacy while enabling personalization.

These techniques aren't limited to lyric translation—they can be applied to any application needing to optimize API usage, especially when dealing with text processing, translations, or any data with inherent repetition patterns.

Check out LyricLingo to see these optimizations in action, or explore the GitHub repository for a deeper dive into the code.