Spoken Word Archive

What We Have Today

complete

Full Transcripts

Every lecture transcribed from the original VedaBase 2025 export. Speaker markers, Sanskrit terms in italic, scripture references linked. Meticulously cleaned up over years — still a work in progress.

complete

Audio Files

MP3 audio for every recording hosted at media.prabhupada.io. Playable inline with the transcript. Speed control, position memory, chapter-based audiobook queues.

complete

Structured Metadata

Every lecture tagged: date, location, type (BG class, SB class, morning walk, room conversation, initiation, arrival address), speaker markers, scripture references.

in progress

Formatting Quality

21-test formatting suite validating markdown structure. Fixing italic pairing, diacritical marks, speaker markers, wiki link formatting. 3,800 files, thousands of issues being resolved systematically.

The Audio Sync Pipeline

The next major step: matching every word in the transcript to its exact moment in the audio. This enables highlighted-as-spoken reading, quotable audio clips, and searchable audio.

1

AI Transcription Pass

Run each audio file through a speech-to-text model (Whisper or Gemini) that produces a timestamped transcript. This gives us word-level timecodes — but the text won't match our cleaned transcripts perfectly.

Output: raw AI transcript with timestamps per word/phrase

2

Alignment Against Verified Transcript

Our existing transcripts (from VedaBase, manually cleaned) are the source of truth for text. The AI transcript is the source of truth for timing. We align the two using sequence matching — transferring timestamps from the AI output onto our verified text.

This is the critical step: the AI might hear "Krishna" where our transcript has "Krsna" — the alignment handles these mismatches.

3

SRT / Subtitle Generation

Produce standard SRT subtitle files for every lecture. Each subtitle entry maps a passage of text to a time range. These are universal — usable in any media player, embeddable in web players, parseable by apps.

1

00:01:12,400 --> 00:01:18,200

So this Krishna consciousness movement

is not a sentimental movement.

2

00:01:18,200 --> 00:01:24,800

It is the most scientific movement

for the benefit of the whole human society.

4

Quality Review

Spot-check alignment accuracy. Flag sections where audio quality is poor (early recordings, background noise, multiple speakers talking over each other). Mark confidence levels per segment. Human review for flagged sections.

5

Integration

Once we have SRT files, the possibilities open up:

Highlight-as-spoken

Current passage lights up as audio plays, like a karaoke for lectures

Quotable audio clips

Select a passage in the transcript, get a shareable audio clip of just that quote

Audio search

Search for a phrase, jump to the exact moment in the audio where it's spoken

Transcript verification

Compare AI hearing vs existing transcript to catch transcription errors

How We Got Here

Source

Transcripts exported from VedaBase 2025 — the authoritative source. Raw text with encoding issues, formatting inconsistencies, and legacy markup.

Cleanup

Years of meticulous cleanup: fixing character encoding, restoring Sanskrit diacriticals, structuring speaker markers, linking scripture references with wiki links, formatting stage directions.

Testing

Comprehensive 21-test formatting suite scanning all 3,800 files. Catches unpaired asterisks, broken italic spans, orphaned markers, misplaced speaker names. Thousands of issues identified and being resolved.

What's in the Archive

Bhagavad-gita Classes

Verse-by-verse lectures on all 18 chapters

bg

Srimad-Bhagavatam Classes

Daily morning lectures on SB verses

sb

Caitanya-caritamrta Classes

Lectures on Lord Caitanya's pastimes

cc

Morning Walks

Informal discussions while walking

mw

Room Conversations

Meetings with guests, scholars, devotees

r1, r2...

Arrival Addresses, Initiations, Festivals

Special occasions and ceremonies

ar, in, mf

Browse the Archive

Every lecture is available now — with transcript and audio. Audio sync is the next step.

Browse Lectures at prabhupada.io

Spoken WordArchive

What We Have Today

Full Transcripts

Audio Files

Structured Metadata

Formatting Quality

The Audio Sync Pipeline

AI Transcription Pass

Alignment Against Verified Transcript

SRT / Subtitle Generation

Quality Review

Integration

Highlight-as-spoken

Quotable audio clips

Audio search

Transcript verification

How We Got Here

What's in the Archive

Browse the Archive

Spoken Word
Archive