TLD2 Architecture Reference
Complete technical documentation for Chrome Extension Manifest V3 with local AI, neural TTS, and CSP-compliant architecture
Last updated: January 2025 • For developers & technical users
Table of Contents
- System Overview
- Manifest V3 Structure
- Component Architecture
- Data Flow & Message Passing
- Content Extraction (Readability.js)
- AI Summarization Pipeline
- TTS Integration (StreamingKokoroJS)
- Storage Architecture
- CSP Compliance & Static Bundling
- Permissions & Security
- Performance Optimizations
- Technology Stack
System Overview
TLD2 is a privacy-first Chrome extension that provides AI-powered article summarization with high-quality neural text-to-speech. The architecture emphasizes local execution, CSP compliance, and modern web technologies.
Core Capabilities
- Content Extraction: Readability.js parses web pages to isolate main article content
- AI Summarization: Chrome's built-in AI or Google Gemini API (optional)
- Neural TTS: StreamingKokoroJS with GPU acceleration (WebGPU/WASM)
- Privacy-First: 100% local processing by default, no telemetry
High-Level Data Flow
Manifest V3 Structure
TLD2 uses Chrome Extension Manifest V3 with strict Content Security Policy for maximum security.
Critical CSP Configuration
'wasm-unsafe-eval' is required for ONNX Runtime Web to execute WebAssembly. This is the only CSP exception needed and is necessary for neural network inference in the browser.
Security Note: This directive only allows WebAssembly execution, not arbitrary JavaScript eval(). All JavaScript remains statically bundled and verified.
Component Architecture
1. Background Service Worker
File: background/service-worker.js
Role: Orchestrates extension lifecycle, message routing, and side effects
Responsibilities:
- Initialize extension on install/startup
- Create context menu items
- Handle icon clicks (open sidebar)
- Route messages between content scripts and sidebar
- Manage chrome.storage operations
2. Content Script
File: content/content-script.js
Role: Executes in web page context to extract article content
Injection Strategy:
- Dynamic Injection: Injected when sidebar opens (not on every page load)
- Isolated World: Runs in separate JavaScript context from page scripts
- Message-Based: Communicates with background via chrome.runtime.sendMessage
3. Sidebar UI
Files: sidepanel/sidepanel.html, sidepanel/sidepanel.js, sidepanel/sidepanel.css
Role: Main user interface for displaying summaries and controlling TTS
Architecture Pattern:
- Single Page Application (SPA): No page reloads, dynamic content updates
- Event-Driven: Button clicks, slider changes trigger handlers
- State Management: Local state object tracks playback, settings, content
Key UI Components:
| Component | Purpose | Implementation |
|---|---|---|
| Menu Bar | Copy, share, settings actions | Button group with icon + text |
| Playbar | TTS playback controls | Play/pause, progress bar, shuttle controls |
| Summary Display | Show streaming summary text | Scrollable div with fade-in animation |
| Progress Bar | Visual playback position | Two-layer: played + buffered |
| Settings Panel | Configure TTS, AI, preferences | Slide-down modal overlay |
Data Flow & Message Passing
Message Passing Architecture
Chrome Manifest V3 requires asynchronous message passing between extension components. TLD2 uses a structured message protocol:
Message Flow Examples
1. Content Extraction Flow
2. Settings Update Flow
Content Extraction (Readability.js)
Readability.js Integration
Library: Mozilla's @mozilla/readability (v0.5.0)
Purpose: Parse web pages to extract main article content, removing ads, navigation, and clutter
How It Works:
- Clone Document: Create deep clone of current DOM to avoid modifying page
- Parse HTML: Algorithm analyzes HTML structure, content density, and semantic markup
- Score Content: Assigns scores to page elements based on article likelihood
- Extract Article: Returns highest-scoring content as plain text
Handling Edge Cases
- Dynamic Content: Wait for lazy-loaded content before extraction
- Single-Page Apps: May require manual scrolling to trigger content load
- Paywalls: Extraction fails if content is behind login—returns error gracefully
- Non-Article Pages: Search results, index pages return minimal content—user warned
AI Summarization Pipeline
Dual-Mode Architecture
TLD2 supports two summarization backends with automatic fallback:
Mode 1: Local AI (Chrome Built-in)
API: chrome.ai.summarizer (Chrome 120+)
Advantages: 100% private, offline, no API costs
Mode 2: Cloud AI (Google Gemini)
API: Google Gemini 2.5 Flash Lite
Advantages: Higher quality, better comprehension of technical content
Streaming Display
Summaries are displayed with streaming animation (simulated LLM-style output at 120 tokens/second):
TTS Integration (StreamingKokoroJS)
See TTS Implementation Documentation for complete technical details.
High-Level Integration
- Library: StreamingKokoroJS (Kokoro 82M ONNX model)
- Model: model_q8f16.onnx (86MB, quantized)
- Backend: WebGPU (preferred) or WASM (fallback)
- Voices: af_sky, af_nicole, bm_fable, bm_lewis
Storage Architecture
chrome.storage.local vs. chrome.storage.sync
| Storage Type | Use Case | Limit | Syncs Across Devices? |
|---|---|---|---|
local |
Cached models, large data | 10 MB | No |
sync |
User settings, preferences | 100 KB | Yes (if Chrome Sync enabled) |
Storage Schema:
CSP Compliance & Static Bundling
Content Security Policy Challenges
Manifest V3's strict CSP prevents:
- ❌ Loading scripts from CDNs
- ❌ Using
eval()ornew Function() - ❌ Inline scripts without hashes
- ❌ Remote model downloads at runtime
TLD2's CSP Solutions
1. Static Library Bundling
All dependencies bundled at build time using esbuild:
2. Local Model Storage
ONNX models included in extension package:
3. WASM Path Override
Point ONNX Runtime to local WASM files:
Permissions & Security
Required Permissions Breakdown
| Permission | Purpose | Security Implication |
|---|---|---|
activeTab |
Read content of active tab when TLD2 is triggered | Only accesses pages user explicitly activates TLD2 on |
scripting |
Inject Readability.js content script | Required for content extraction; runs in isolated context |
storage |
Save user settings and cache models | Local only; no remote sync without user consent |
sidePanel |
Display sidebar UI | No sensitive data access; UI-only permission |
contextMenus |
Add right-click "Summarize" option | No data access; UX enhancement only |
Security Best Practices
- ✅ No remote code execution
- ✅ API keys stored locally with encryption (via chrome.storage)
- ✅ No analytics or telemetry
- ✅ No external network requests (in local AI mode)
- ✅ Minimal permissions (no
<all_urls>ortabs)
Performance Optimizations
1. Lazy Loading
Models loaded only when TTS is first used:
2. Streaming Architecture
Summary text streams to UI while TTS chunks process in parallel:
- Summary displays incrementally (perceived speed boost)
- TTS generates first audio chunk while later text still streaming
- Playback starts before full article synthesized
3. GPU Acceleration
WebGPU provides 2-10x speedup over WASM for TTS synthesis. See GPU Acceleration Guide for details.
Technology Stack
| Technology | Version | Purpose |
|---|---|---|
| Chrome Extension API | Manifest V3 | Extension framework |
| @mozilla/readability | 0.5.0 | Article content extraction |
| kokoro-js | 1.0.0 | Neural TTS engine |
| @xenova/transformers | 2.17.2 | ML model loading (Transformers.js) |
| onnxruntime-web | 1.20.0 | Neural network inference |
| esbuild | 0.25.10 | Build tool & bundler |
Build Process
Further Reading
For implementation details on specific subsystems:
- TTS Implementation - Complete StreamingKokoroJS integration guide
- API Reference - Internal API documentation
- GPU Acceleration - WebGPU optimization strategies
This architecture documentation is maintained as part of the TLD2 open development process. For questions or contributions, contact 0xdespot@0xdespot.com.