How to display H264 TCP stream on the web

Question

I have a set of tiny cameras, which are streaming H264 encoded video over TCP. I need to somehow connect to them on-demand based on user actions in the browser and display the live stream to them.

I've been searching all over the Internet on how this can be achieved but not successfully. The closest I got to this result was writing a small program using libav and C++ to connect to the video stream, save them as motion JPEG and then use mjpg_streamer to display the result as a live stream. But this solution is overly complicated and my program crashes with errors like:

Failed to decode av_out_packet: Operation now in progress

Or

Failed to read av_frame

Here's the piece of code I use to decode the stream.

void decode_stream(const char * address, int threadIdx, const char * output_dir) {
    std::cout << "Started decoding thread ID: " << std::this_thread::get_id()  << "  TID: " << threadIdx << std::endl;

    AVFormatContext *av_format_ctx = avformat_alloc_context();

    // register timeout callback
    auto * ith = new ffmpeg_interrupt_handler(default_timeout * 10);
    av_format_ctx->interrupt_callback.opaque = (void *)ith;
    av_format_ctx->interrupt_callback.callback = &ffmpeg_interrupt_handler::check_interrupt;

    AVInputFormat *av_input_fmt = av_find_input_format("h264");

    if (avformat_open_input(&av_format_ctx, address, av_input_fmt, nullptr) != 0) {
        avformat_close_input(&av_format_ctx);
        perror("Could not open input context");
        exit(EXIT_FAILURE);
    }

    int video_stream_index = -1;

    AVCodec* av_codec;
    AVCodecParameters * av_codec_params;

    //find valid video stream
    for (int i = 0; i < av_format_ctx->nb_streams; ++i) {
        av_codec_params = av_format_ctx->streams[i]->codecpar;
        av_codec = avcodec_find_decoder(av_codec_params->codec_id);

        if (!av_codec) {
            perror("Could not find coded decoder");
            continue;
        }

        if (av_codec_params->codec_type == AVMEDIA_TYPE_VIDEO) {
            video_stream_index = i;
            break;
        }
    }

    if (video_stream_index == -1) {
        perror("Could find valid video stream.");
        exit(EXIT_FAILURE);
    }

    //allocate codec context
    AVCodecContext * av_codec_ctx = avcodec_alloc_context3(av_codec);
    if (!av_codec_ctx) {
        perror("Could not create AVCodec Context\n");
        exit(EXIT_FAILURE);
    }

    if (avcodec_parameters_to_context(av_codec_ctx, av_codec_params) < 0) {
        perror("Could not initialize AVCodec Context\n");
        exit(EXIT_FAILURE);
    }

    if (avcodec_open2(av_codec_ctx, av_codec, nullptr) < 0) {
        perror("Could not open AVCodec\n");
        exit(EXIT_FAILURE);
    }

    AVFrame* av_frame = av_frame_alloc();

    if (!av_frame) {
        perror("Could not allocate AVFrame");
        exit(EXIT_FAILURE);
    }

    AVPacket *av_packet = av_packet_alloc();
    if (!av_packet) {
        perror("Could not allocate AVFrame");
        exit(EXIT_FAILURE);
    }

    AVCodec *av_out_codec = avcodec_find_encoder(AV_CODEC_ID_MJPEG);
    if (!av_out_codec) {
        perror("Could not find MJPEG codec");
        exit(EXIT_FAILURE);
    }

    AVCodecContext *av_out_codec_ctx = avcodec_alloc_context3(av_out_codec);
    if (!av_out_codec_ctx) {
        perror("Could not allocate output context");
        exit(EXIT_FAILURE);
    }

    av_out_codec_ctx->width = 1280;
    av_out_codec_ctx->height = 720;
    av_out_codec_ctx->pix_fmt = AV_PIX_FMT_YUVJ420P;
    av_out_codec_ctx->time_base = (AVRational){5, AVFMT_VARIABLE_FPS};

    if (avcodec_open2(av_out_codec_ctx, av_out_codec, nullptr) < 0) {
        perror("Could not open output codec");
        exit(EXIT_FAILURE);
    }

    AVPacket *av_out_packet = av_packet_alloc();

    std::string output_filename = output_dir;

    if (! fs::exists(output_dir)) {
        fs::create_directory(output_dir);
    } else if ( fs::exists(output_dir) && ! fs::is_directory(output_dir)) {
        perror("Target output is not a directory!");
        exit(EXIT_FAILURE);
    }

    std::string output_final_dir = output_dir;
    output_final_dir += "stream_" + std::to_string(threadIdx);

    if (! fs::exists(output_final_dir)) {
        fs::create_directory(output_final_dir);
    }

    output_filename += "stream_" + std::to_string(threadIdx) + "/stream_" + std::to_string(threadIdx) + ".jpg";

    int response;
    FILE *JPEGFile = nullptr;

    ith->reset(default_timeout);
    while (av_read_frame(av_format_ctx, av_packet) >= 0) {
        if (av_packet->stream_index == video_stream_index) {
            response = avcodec_send_packet(av_codec_ctx, av_packet);

            if (response < 0) {
                perror("Failed to decode av_out_packet");
                exit(EXIT_FAILURE);
            }

            response = avcodec_receive_frame(av_codec_ctx, av_frame);
            if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) {
                continue;
            } else if (response < 0) {
                perror("Failed to decode av_out_packet");
                exit(EXIT_FAILURE);
            }

            if (av_frame->format != AV_PIX_FMT_YUV420P) {
                printf("Generated file may not be a grayscale\n");
            }

            // send frame to encode into out format
            avcodec_send_frame(av_out_codec_ctx, av_frame);

            // receive encoded out data
            avcodec_receive_packet(av_out_codec_ctx, av_out_packet);

            // open output
            JPEGFile = fopen(output_filename.c_str(), "wb");
            if (JPEGFile == nullptr || JPEGFile == NULL) {
                perror("Could not open output file");
                fclose(JPEGFile);
                JPEGFile = nullptr;
                break;
            }
            // write to output
            fwrite(av_out_packet->data, 1, av_out_packet->size, JPEGFile);

            // close output
            if (! fclose(JPEGFile)) {
                JPEGFile = nullptr;
            }

            // unref out packet
            av_packet_unref(av_out_packet);

            av_packet_unref(av_packet);
            // reset packet timeout
            ith->reset(default_timeout);
        }
    }


    if (JPEGFile != nullptr) {
        fclose(JPEGFile);
        JPEGFile = nullptr;
    }
    std::cout << "Exiting thread: " << threadIdx << std::endl;
    should_stop_thread[threadIdx] = true;
    av_packet_free(&av_out_packet);
    avcodec_close(av_out_codec_ctx);

    av_frame_free(&av_frame);
    av_packet_free(&av_packet);
    avformat_close_input(&av_format_ctx);
    avformat_free_context(av_format_ctx);
    avcodec_free_context(&av_codec_ctx);
}

Anyways, if there is a simpler solution which I am missing, I am open to it. Delay between real stream and displayed video is critical for me and can not be more than 1 second.

use Ffmpeg to repackage (-codec copy) the h264 stream to a fragmented mp4 container then display with media source extensions. — szatmary, Apr 14 '21 at 21:26
Ok and how can I use ffmpeg to start on demand? I need to send a HTTP request to a server which would start the program. I tried using media sources but there was always at least 10 second delay between the stream and displayed video, which is wrong. I need the video in browser to be no less than few hundred ms delayed. — Lukáš Moravec, Apr 15 '21 at 07:25
Is this raw H.264 unboxed in [fMP4](https://stackoverflow.com/questions/35177797/what-exactly-is-fragmented-mp4fmp4-how-is-it-different-from-normal-mp4) or [webm](https://www.webmproject.org/about/)? How is it delivered? One http request delivering an endless response stream? Does the H.264 stream always start fresh when the camera gets a request? This *can* be done in pure browser Javascript, but you'll have to know a lot more about the incoming data than you've mentioned. Transcoding to mJPEG won't give you low enough latency. — O. Jones, Apr 15 '21 at 12:29
AFAIK, Everytime when I connect to the camera using a TCP socket connection first a full frame is send and then only differences between frames. So I suppose the stream start fresh with each request. And yes, it is then delivered endlessly unless I disconnect from the camera or some error occurs. About your first question about fMP4 or webm, I am not sure and can not answer that. Is there a way how can I found out which method is used? — Lukáš Moravec, Apr 15 '21 at 12:38
Ok so what other information do I need? And then, how could it be done in the browser with pure JS? Do you mean using WebSockets and MSE? — Lukáš Moravec, Apr 16 '21 at 05:48

How to display H264 TCP stream on the web

0 Answers0