How can i transform dvb subtitles into text format using FFMpeg within a live streaming or how can i optimize the dvb burning process?

Question

I am working on a hls transcoder from any format to HLS and I need to encode multiple subtitles with the format "dvbsub" at the same time with the purpose of being selected by a client who interprets the m3u8 HLS playlist.

The main problem is that burning each dvbsub into a live video stream in this way:

 "-filter_complex "[0:v][0:s:0]overlay[v0];[0:v][0:s:1]overlay[v1];[0:v][0:s:2]overlay[v2];......"

is a very CPU intensive task. (I have 8 or more dvbsub in the same stream).

Does Anyone know how to transform each dvbsub into a text format (webvtt for example) or if there is a way to optimize the process? (I tried to perform this burning process with NVIDIA gpu but I have not achieved any improvement)

I read about OCR programs which can do the task but after days of research i still dont know how to do that.

Thanks in advance.

EDIT: The input is a live UDP signal. I need to do the transformation on the fly.

Thank you but that solution doesnt work for me. I cant get the subtitles from a file and do the OCR transforming in that way. I need to take video, audio and subtitles from a live UDP mpegts and transform the subtitles in real time. The subtitles filter doesnt work for a udp signal and filter_complex is what i am using right now. — alexsua, Oct 28 '18 at 19:18
Then you're out of luck. At present, there are no ffmpeg filters which emit subtitles. OCR can be performed, but that will have to be dumped to file. — Gyan, Oct 28 '18 at 19:31

score 3 · Accepted Answer · answered Dec 05 '18 at 09:10

With ccextractor (https://github.com/CCExtractor/ccextractor) you can extract dvbsub and dvb_teletext subtitles.

To extract dvbsubs you will need to compile ccextractor with OCR support.

Install dependencies:

$ sudo apt-get update
$ sudo apt-get install tesseract-ocr-dev
$ sudo apt-get install tessercat-ocr-*
$ sudo apt-get install -y gcc
$ sudo apt-get install -y libcurl4-gnutls-dev
$ sudo apt-get install -y libleptonica-dev

In ccextractor code:

$ mkdir build && cd build
$ cmake -DWITH_OCR=ON ../src/ 
$ make -j4

Stream your content by udp (-map 0:18 is getting only dvbsub content from multiplex) :

$ ffmpeg -re -i mux562.ts -map 0:18 -c:s dvbsub -f mpegts udp://239.0.0.1:5000

Read your udp stream live and get srt output:

$ ccextractor -s -codec dvbsub -in=ts -udp 239.0.0.1:5000 -o output.srt

You can write srt output to FIFO or to stdout, please refer to ccextractor help

score 0 · Answer 2 · answered Oct 29 '18 at 10:58

This is the answer to your question, however, it won't be accepted as such because you won't like the answer.

You can't do it. That unfortunately is the answer.

Your subtitles are graphic based, bitmaps, you have to OCR, and then check them for errors and/or anomalies, beforehand. You can't do it on the fly.

Depending on what you are playing, there's many on-line resources where the text based subtitle equivalents are available.

I wish you luck.

How can i transform dvb subtitles into text format using FFMpeg within a live streaming or how can i optimize the dvb burning process?

2 Answers2