16

On Youtube, I can download the CC transcript for a video but the transcript does not contain punctuation. How can I punctuate the transcript automatically?

William
  • 761
  • 2
  • 10
  • 27
  • Can you specify whether you are trying to do it via the youtube apis or via code on the client? – Nate T Dec 25 '20 at 04:51
  • 1
    Any method is welcome. Better to use a software or service, i.e., upload the raw transcript/video/audio and download the punctuated transcript. – William Dec 25 '20 at 06:31

3 Answers3

9

This is a problem studied in Natural Language Processing (NLP), which is often referred to as punctuation restoreation. There are some deep learning solutions that can achieve this, but they aren't perfect, although they can achieve decent results. You can try using https://github.com/ottokart/punctuator2, which is based on this paper. (you can try it out here).

wiktort1
  • 330
  • 2
  • 4
  • Fantastic tool. I just used it to punctuate a youtube transcript and it works great. I tried the whole document at first, but it stopped auto-punctuating at around 35K characters. So I hand-divided it into reasonable chunks. What a timesaver. – Chris C Sep 25 '22 at 15:21
4

There's no way to get them from youtube, you'll have to generate them yourself. Google offers a service that generates punctuation for arbitrary text, and from my personal experience, it's more accurate than some competitors, so I would run it through that.

Carson
  • 2,700
  • 11
  • 24
  • 2
    This service requires you to extract the audio from video and upload it. And it is a paid service. – William Dec 25 '20 at 06:33
3

In 2023 there are multiple ways to do it:

  1. Use chatGPT. It works very well but because of limits on input text it's quite a cumbersome process for long videos (60min+). Apart from processing batches you have to control output quality for each batch as it is not 100% consistent yet.
  2. Use Deep Multilingual Punctuation Prediction. It can restore the punctuation with accuracy 77% for English text. But it won't fix capital letters.
  3. Use yt-dlp and Whisper. Download mp3 from Youtube and run Whisper. This OpenAI's model does very good speech-to-text and provides output with punctuation. But it's quite slow for long video/audio (processing 60 mins audio takes approx 30 mins). Example implementation
  4. Use yt-dlp and whisper.cpp. This works faster, processing 60 mins audio takes less than 10 mins. My example implementation
  5. Use Shoki.app
Alena Melnikova
  • 931
  • 7
  • 9
  • I tried using chagGPT. Indeed it works well but the prompt has to be carefully written otherwise the target text might change. It is also not free. – Andy May 02 '23 at 14:02