Amy and I have been teaching online for over a decade, and for more than half of that time, pre-recorded video lessons have been one of our cornerstone teaching tools of choice.
While we’ve helped thousands of students learn to build and launch products that sell, I’m a bit embarrassed to say that our favored format was leaving a large audience behind: without transcripts or closed captions, our courses have excluded a wide range of learners, most of all those who are deaf or hard of hearing.
This unintended exclusion is something we’ve been wanting to solve for a long time. For every one person who emailed us asking about captions or transcripts, we can assume there were many more who didn’t even bother asking.
Friend and 30x500 alumni Joel Hooks from egghead.io has also been prodding and reminding us (in a friendly way) to add transcripts for a long time too!
We’d set out to solve this problem in the past, but never took it across the finish line.
The biggest barrier was that every experience I’d had ordering production-quality transcripts online had yielded very disappointing results.
Naturally, price and quality are the two biggest factors, but even at top dollar the quality was variable at best. We could spend $2-3+ per minute and still need to hire an editor to clean them up. Frustrating!
But in 2020, a mix of new technology and our refreshed focus on investing in access finally converged…and we finally came up with a workable solution!
As of January, all 732 minutes of video in our flagship business course 30x500 are now fully transcribed and captioned!
And we didn’t stop there…every episode of our podcast Stacking the Bricks is also transcribed, with an interactive transcript embedded right on the episode page.
If you take a look at any of those podcast transcripts, you’ll see that the quality is top notch. These aren’t simply literal word-for-word transcriptions, they’ve been cleaned by a human editor to remove speech tics and other language patterns that don’t map well from spoken to written word.
Since launching the podcast transcripts, and subsequently the captions and transcripts for 30x500 lessons, lots of people have asked how we did this and exactly how much it cost with hopes that they could make their course videos and podcasts more accessible too!
By using a combination of Automated Transcription by Descript with a freelance transcript editor, we were able to create thousands of minutes worth of extremely high quality transcripts and captions for less than $1.50 per minute, all in!
Lemme show you how we did it, step by step.
This guide includes the step-by-step process for finding and training a freelance editor, and the total cost per minute breakdown of the final product. Here we go!
- Step 1: Rough Transcripts for Pennies per Minute
- Step 2: Finding the right editor
- Step 3: Training to my tastes
- Step 4: Scale to Mass Production!
- Step 5: Re-sync the clean transcripts to the videos
- Don’t wait like we did.
Step 1: Rough Transcripts for Pennies per Minute
I (re)discovered Descript when I made the decision to bring back our Stacking the Bricks podcast for a season of new episodes based on my Tiny MBA Podcast Tour.
Descript is a piece of software that slurps in audio and video files, uses AI to auto-generate a transcript. The true magic of Descript is that it lets you use that transcript to edit your audio and video. Remove text from the transcript, it’s removed from the audio. Move text around the doc, and the corresponding audio or video move around the timeline to match. It’s truly magical, and saved me 3-4 hours of editing time per podcast episode.
The auto-generated transcripts are not perfect. Descript struggles most noticeably with transcribing proper nouns. For laughs I really should have saved a list of the many ways it’s transcribed Alex Hillman 😂.
That said…the auto-generated transcripts were surprisingly good, maybe 80-85%+ as good as the rough draft transcripts I’d gotten from various human transcription marketplaces even after paying upwards of $2 per minute.
And for just $30 per month, my Descript plan included 30 hours of transcribing per month.
Plus, Descript has the built-in capability of exporting .srt files for synced closed captions on our videos. That’s a two-for-one.
Once I realized I could get my rough transcripts for $1 per hour vs $1-2 per minute…hiring that human editor suddenly made way more sense!
Step 2: Finding the right editor
Even removing price from the equation, finding and hiring great people for any kind of work is one of the hardest tasks on the planet.
My go-to is tapping my network, so I started by asking around if anybody had a great transcript editor they’d worked with in the past. Unfortunately most people I talked to had run into the same challenges I had…and the one person who had a great person was hesitant to give ‘em up and risk losing their secret weapon.
So I went to the open market for freelancers of all kinds: Upwork!
Now, Upwork has a…shall we say “variable” reputation among technical and creative people. It tends to be highly competitive, with a focus on cheap labor rather than quality work.
Anecdotally, I personally know people who managed to escape the “pricing race to the bottom” by offering highly specialized services. I had a feeling that as long as I wasn’t optimizing for the cheapest, and positioned my project correctly, I’d find the person I was looking for.
I did a little bit of research to see what the price range I might expect for transcript cleanup, and saw prices ranging from as little a $7 to as much as $20/hour. Even at the top end of that price range, this was looking affordable.
So I created a project description
Since I was working on the Stacking the Bricks podcast at the time, I had podcast episodes and their rough transcripts from Descript ready and waiting. I picked one and put together a job posting.
Clean up Audio Transcriptions - 40-60 mins each
Back catalog of ~35 podcasts with transcripts generated by Descript. Average length 40-60 mins each.
Looking for an independent freelancer to clean up the transcripts so they are clear to read and error free. Example visible at https://stackingthebricks.com/podcast/ep35-debugging-humans-with-michele-hansen-and-colleen-schnettler/
New episodes shipping weekly, happy to continue working with someone consistently every week once we see quality work.
I priced our project “high” on purpose
Knowing that the range of transcriptionists on the platform ranged widely, I specifically listed this project as paying $15-20/hr hoping to signal that I was prioritizing quality over price and attract higher quality candidates.
Within an hour I had a bunch of people applying to the project! Many quickly ruled themselves out with spelling errors, and several offered extremely low rates attempting to stand out by being cheaper.
I turned off new applications for the project, and plucked a few of the better applications.
I offered each freelancer a short, paid test project
Ultimately I was optimizing for quality over speed or price, so I wanted to see how a few different people would approach the project with relatively limited instructions.
I ended up picking 5 people whose messages were clearly not a copy/paste, and who had at least a few positive reviews from past projects.
Each of the 5 candidates were offered full rate to “clean up the first 15 minutes worth of transcript text so that it could easily be read and understood without the accompanying text.”
I compared the results
Of the 5 candidates results, 3 had only cleaned up the most obvious typos, leaving in words and phrases that Descript had transcribed incorrectly.
The other two freelancers covered the vast majority of the language errors, and only one of the candidates responded noting that some of the run-on sentences could be made easier to read if they were broken up into multiple sentences, adding words and punctuation to smooth readability.
She offered specific improvements in service of the goal, which was transcripts that were easy and pleasant to read.
She also happened to be the most specific candidate, but given the outstanding performance on the test, she was the obvious freelancer to move forward.
“Congrats, you’re hired!”
I paid everyone else, plus a small bonus.
Since I promised everyone that this was a paid test, I told them nicely that we had chosen another freelancer and paid for their time. Upwork even gave me the chance to pay a small bonus, so I gave the other 4 candidates a small additional bonus as a thank you for their effort.
Step 3: Training to my tastes
I had decided to use the Podcast transcripts, which are decidedly lower stakes than our tightly edited video lessons, as the “training wheels” for the editing style I wanted.
Our freelancer had done a great job with the short test, so I asked her to finish that episode.
I specifically told her that I had appreciated her noticing opportunities to actually fix the text to make it easier to understand instead of just fixing spelling and grammatical errors, and that she should continue that way.
I said: “if you see ways to make this better, go ahead and ask, or even better just do it and make a note of what you did and why”
A few hours later I got back a near perfect transcript! She told me what she chose to fix and why, and I confirmed each of her decisions. I spotted a few small structural things, like how often I wanted her to add line breaks to break up long monologues.
She took notes on my preferences, and we decided to have her apply them to another 2 episodes of the podcast.
Once done, we reviewed those transcripts. Made a few more stylistic notes. Repeat.
Each time, I gave her a Dropbox folder with the original audio file + the rough transcript exported from Descript as a .rtf file. She sent back a fresh .rtf file with her initials in the filename.
Within 3 rounds, I felt extremely confident in her delivering the rest of the podcast transcripts that were exactly what I wanted or damn close to it!
A few weeks later, she wrapped up our entire podcast back catalog and we even came up with a publishing workflow for our new episodes going forward.
Bonus: Pull interesting quotes while you’re in there!
One of the notes I got from our freelancer was that she was really enjoying the podcast itself, and learning from the topics and discussion.
Given that she was both interested and curious, I realized that she might be able to pull out interesting quotes that could be turned into social media sharing clips while she was working through the transcript cleanup.
Step 4: Scale to Mass Production!
Unlike the podcast episodes, which were typically 30-40 minutes each, our lessons tend to be much shorter…there are just a lot of them. So I prepped a Dropbox folder that mirrored our lesson structure and started filling ‘em up with the video files. I batched using Descript to generate the rough transcripts, and uploaded each one next to the accompanying video.
But the best part was that at this point I had a ton of confidence in our freelancer’s process, so I told her to start using everything we’d learned up to that point and that she didn’t need to stop unless she ran into a question or challenge.
And thanks to front-loading the training process, she pulled it off without a hitch! More than 60 videos later, I had a big ‘ol library of fine-tuned transcripts for every video in our course.
Step 5: Re-sync the clean transcripts to the videos
If I’m 100% honest, I hadn’t thought about this step in my original plan and for a moment, thought I’d made a huge mistake!
My plan was to use Descript’s built in subtitle exporter to generate .srt files that work with basically any video player that supports captions. But…our freelance editor had been doing the transcript cleanup outside of Descript. Whoops.
Thankfully, Descript had a fix built right in!
When you add a new video or audio file to Descript, by default it offers the option to auto-generate a transcript. But if you click the drop-down, you can also import an existing transcript.
Follow the steps, which includes pasting in your cleaned up transcript, and within 60 seconds it automatically syncs your manually cleaned transcript to the original audio or video file.
Note: as far as I can tell this sync doesn’t count against your limited transcription minutes on your Descript plan.
With the cleaned up transcripts re-synced to the video, my .srt files were just a few clicks away and I could upload them directly to our courseware!
In total, 30x500 had approximately 732 minutes, or 12.2 hours, of video lessons total.
The complete 30x500 transcript cleanup project took approximately 43 billable hours, at $20 per hour, for a total cleanup cost of $870.20.
Descript cost $30/month, and this project spanned about 3 months, so let’s call that cost $90.
The podcast project that we essentially used as training worked out to almost the exact same price-per-minute, so if I generously extract the first 5 hours from that and add it to this project, the standards training “cost” about $100 of billable time, though we benefit greatly from the output of that training since all of our podcasts have transcripts now too.
That brings our total hard cost to $1060.20, or $1.44 per minute of media ALL IN!
Best of all, we now have a direct relationship with our transcription freelancer so when we want more transcripts cleaned up, I don’t have to worry about finding a new freelancer and starting the training process all over again.
The next level will be having our editor document her process into a standards and style guide that other freelancers can follow, in the event she’s no longer available.
Since this project we’ve also hired someone whose expertise is making sure that PDFs and other documents are fully accessible, and she’s helping us deliver our transcripts that play nice with screen readers too.
Don’t wait like we did.
You can and should improve your own course or podcast’s accessibility!
Last year, we invested in the financial accessibility of 30x500 by creating programs to offer adjusted prices for a global population of students, and access scholarships for BIPOC entrepreneurs.
Within the first 6 months of these programs alone, we saw an increase of more than 20x increase in participation from the communities they were designed to support. We still have a ways to go, and the real success will be seen in years not months as those students build their businesses into product empires. But we’re encouraged by these results!
The addition of captions and transcripts is the next small step in the journey of improving access around 30x500 and other Stacking the Bricks programs and properties. Early response in our community has been very positive, and we’re going to keep going.
And we hope that sharing the behind-the-scenes encourages more online educators to do the same!
If you teach a video course online, popular platforms like Teachable and Podia both support adding captions produced by this process. The effort and cost can be front-loaded, and once things are moving it’s a fairly straightforward process.
Amy and I hope this guide helps you invest in improved accessibility for your students, too!
There's more where that came from
We email every Wednesday with the latest insights from our business, our students, and our research. Drop your email in the box below and we'll send new stuff straight to your inbox!
Absolutely no spam, ever. We respect your email privacy. Unsubscribe anytime. Huzzah!