Capture first image from h.264 video streaming using websocket - Python

Question

I'm trying to capture a single image from H.264 video streaming in my Raspberry Pi. The streaming is using raspivid with websocket. But, cannot show a correct image in imshow(). I also tried to set the .reshape(), but got ValueError: cannot reshape array of size 3607 into shape (480,640,3)

In client side, I successfully connect to the video streaming and get incoming bytes. The server is using raspivid-broadcaster for video streaming. I guess the first byte can be decoded to image? So, I do the following code.

async def get_image_from_h264_streaming():

    uri = "ws://127.0.0.1:8080"
    async with websockets.connect(uri) as websocket:
        frame = json.loads(await websocket.recv())

        print(frame)
        width, height = frame["width"], frame["height"]

        response = await websocket.recv()
        print(response)

        # transform the byte read into a numpy array
        in_frame = (
            numpy
            .frombuffer(response, numpy.uint8)
            # .reshape([height, width, 3])
        )

        # #Display the frame
        cv2.imshow('in_frame', in_frame)

        cv2.waitKey(0)

asyncio.get_event_loop().run_until_complete(get_image_from_h264_streaming())

print(frame) shows

{'action': 'init', 'width': 640, 'height': 480}

print(response) shows

b"\x00\x00\x00\x01'B\x80(\x95\xa0(\x0fh\x0..............xfc\x9f\xff\xf9?\xff\xf2\x7f\xff\xe4\x80"

Any suggestions?

---------------------------------- EDIT ----------------------------------

Thanks for this suggestion. Here is my updated code.

def decode(raw_bytes: bytes):
    code_ctx = av.CodecContext.create("h264", "r")
    packets = code_ctx.parse(raw_bytes)
    for i, packet in enumerate(packets):
        frames = code_ctx.decode(packet)
        if frames:
            return frames[0].to_ndarray() 

async def save_img():
    async with websockets.connect("ws://127.0.0.1:8080") as websocket:
        image_init = await websocket.recv()

        count = 0
        combined = b''

        while count < 3:
            response = await websocket.recv()
            combined += response
            count += 1

        frame = decode(combined)
        print(frame)

        cv2.imwrite('test.jpg', frame)

asyncio.get_event_loop().run_until_complete(save_img())

print(frame) shows

[[109 109 109 ... 115  97 236]
 [109 109 109 ... 115  97 236]
 [108 108 108 ... 115  97 236]
 ...
 [111 111 111 ... 101 103 107]
 [110 110 110 ... 101 103 107]
 [112 112 112 ... 104 106 110]]

Below is the saved image I get. It has the wrong size of 740(height)x640(width). The correct one is 480(height) x 640(width). And, not sure why the image is grayscale instead of color one.

---------------------------------- EDIT 2 ----------------------------------

Below is the main method to send data in raspivid.

raspivid - index.js

const {port, ...raspividOptions} = {...options, profile: 'baseline', timeout: 0};
videoStream = raspivid(raspividOptions)
    .pipe(new Splitter(NALSeparator))
    .pipe(new stream.Transform({
        transform: function (chunk, _encoding, callback){
            ...
            callback();
        }
    }));

videoStream.on('data', (data) => {
    wsServer.clients.forEach((socket) => {
        socket.send(data, {binary: true});
    });
});

stream-split - index.js (A line of code shows the max. size is 1Mb)

class Splitter extends Transform {

  constructor(separator, options) {
    ...
    this.bufferSize  = options.bufferSize  || 1024 * 1024 * 1  ; //1Mb
    ...
  }

  _transform(chunk, encoding, next) {

    if (this.offset + chunk.length > this.bufferSize - this.bufferFlush) {
        var minimalLength = this.bufferSize - this.bodyOffset + chunk.length;
        if(this.bufferSize < minimalLength) {
          //console.warn("Increasing buffer size to ", minimalLength);
          this.bufferSize = minimalLength;
        }
          
        var tmp = new Buffer(this.bufferSize);
        this.buffer.copy(tmp, 0, this.bodyOffset);
        this.buffer = tmp;
        this.offset = this.offset - this.bodyOffset;
        this.bodyOffset = 0;
    }
    ...
  }
};

----------Completed Answer (Thanks Ann and Christoph for the direction)----------

Please see in answer section.

So `cv2.imshow('in_frame', in_frame)` doesn't display anything? — Red, Mar 10 '22 at 05:16
@AnnZen the display do pop up, but no image is shown. Guess the reshape function is not set? — Pak Ho Cheung, Mar 10 '22 at 05:39
Could the reason that your received image is larger than it should be is because the last chunk of data (or part of it) is for the next frame? — Red, Mar 13 '22 at 13:40
@AnnZen I guess this may be one of the causes. Please see my edit. I can successfully get the image, but somehow it is grayscale. — Pak Ho Cheung, Mar 13 '22 at 19:08
Did you use any code to encode the image before sending it via sockets? What code was it? — Red, Mar 13 '22 at 22:10
@AnnZen Please see my edit 2. I embed the repository from github. The edit 2 is the main codes I found. The buffer size should have max 1mb. So, the split should be fine as each data I received is less than 1mb. — Pak Ho Cheung, Mar 14 '22 at 14:32

Christoph · Accepted Answer · 2022-03-12T06:01:51.630

4

One question, how is the frame/stream transmitted trough websocket? The Byte sequence looks like a nal unit, it can be PPS or SPS etc. how do you know its an IFrame for example, i dont know If cv2.imshow Support RAW H264. Look into pyav there u can open h264 raw bytes then you can try to exract one frame out of it :) let me know if you need help on pyav, Look at this post there is an example how you can doit.

Update

Based on your comment, you need a way to parse and decode a raw h264 stream, below is a function that give u and idea about that, you need to pass your recived bytes from websocket to this function, be aware that needs to be enough data to extract one frame.

pip install av

PyAV docs

import av

# Feed in your raw bytes from socket
def decode(raw_bytes: bytes):
    code_ctx = av.CodecContext.create("h264", "r")
    packets = code_ctx.parse(raw_bytes)
    for i, packet in enumerate(packets):
        frames = code_ctx.decode(packet)
        if frames:
            return frame[0].to_ndarray()

You could also try to read directly with pyav the Stream with av.open("tcp://127.0.0.1:")

Update 2 Could u please test this, the issues that you have on your edit are weird, you dont need a websocket layer I thing you can read directly from raspivid

raspivid -a 12 -t 0 -w 1280 -h 720 -vf -ih -fps 30 -l -o tcp://0.0.0.0:5000

def get_first_frame(path):
    stream = av.open(path, 'r')
    for packet in stream.demux():
        frames = packet.decode()
        if frames:
            return frames[0].to_ndarray(format='bgr24')

ff = get_first_frame("tcp://0.0.0.0:5000")
cv2.imshow("Video", ff)
cv2.waitKey(0)

edited Mar 12 '22 at 06:01

answered Mar 10 '22 at 20:05

Christoph

647
5
18

From this post, the streaming is raw h264 as I use the same method to do streaming. https://forums.raspberrypi.com/viewtopic.php?t=231368 The post says the stream contains SPS,PPS,I-frames and P-frames. Thanks – Pak Ho Cheung Mar 11 '22 at 12:08
I update my answer, thats the way you need to go =) I did similar but get H264 over and RTP session but at the end its also a raw H264 stream, just ask if you need help – Christoph Mar 11 '22 at 13:47
I have updated my post. Please check above. I can successfully get an image from the streaming, but I got three questions. 1. not sure why the image is grayscale instead of color one. 2. Is there any way to count how many raw data I should get? So that I don't need to get a chunk of data and convert it to image. 3. the deminsion is not correct. (But, I think this is easy to solve by cropping the image) Thanks. – Pak Ho Cheung Mar 11 '22 at 19:40
Reupadte added above BR – Christoph Mar 12 '22 at 06:02
Tried with your code, but got this error. ``av.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input`` I checked ``netstat -an``, it shows ``cp6 0 0 :::8080 :::* LISTEN``. Is that mean the data format is not correct? – Pak Ho Cheung Mar 12 '22 at 18:13
Did you use my raspidvid command, it looks like u read from websocket and not from tcp://0.0.0.0:5000 – Christoph Mar 15 '22 at 09:42
I finally find out the answer. Should use ``frames[0].to_image()`` instead of ``frame[0].to_ndarray()``. Please see my edit for further explanation. Thanks for the direction – Pak Ho Cheung Mar 15 '22 at 11:51
great =) if you give me the bounty I am not unhappy hahaha – Christoph Mar 15 '22 at 11:53

score 2 · Answer 2 · answered Mar 15 '22 at 17:36

Packages of PyAV and Pillow are required. No need to use cv2 anymore. So, add the packages

pip3 install av
pip3 install Pillow

Codes

import asyncio
import websockets
import av
import PIL

def decode_image(raw_bytes: bytes):
    code_ctx = av.CodecContext.create("h264", "r")
    packets = code_ctx.parse(raw_bytes)
    for i, packet in enumerate(packets):
        frames = code_ctx.decode(packet)
        if frames:
            return frames[0].to_image()

async def save_img_from_streaming():

    uri = "ws://127.0.0.1:8080"
    async with websockets.connect(uri) as websocket:
        image_init = await websocket.recv()

        count = 0
        combined = b''

        while count < 2:
            response = await websocket.recv()
            combined += response
            count += 1

        img = decode_image(combined)
        img.save("img1.png","PNG")

asyncio.get_event_loop().run_until_complete(save_img_from_streaming())

By Christoph's answer, to_ndarray is suggested, but I found it somehow it results a grayscale image, which is casued by the return of incorrect numpy array form like [[...], [...], [...], ...]. The colored image should be an array like [[[...], [...], [...], ...], ...]. Then, I look at the PyAV docs, there is another method called to_image, which can return an RGB PIL.Image of the frame. So, just using that function can get what I need.

Notes that response from await websocket.recv() may be different. It depends on how the server sends.

score 1 · Answer 3 · answered Mar 10 '22 at 05:48

1

This is a problem I once had when attempting to send numpy images (converted to bytes) through sockets. The problem was that the bytes string was too long.

So instead of sending the entire image at once, I sliced the image so that I had to send, say, 10 slices of the image. Once the other end receives the 10 slices, simply stack them together.

Keep in mind that depending on the size of your images, you may need to slice them more or less to achieve the optimal results (efficiency, no errors).

answered Mar 10 '22 at 05:48

Red

26,798
7
36
58

As I did use `sockets` and not `websockets`, I can't guarantee the same results. – Red Mar 10 '22 at 05:55
Is there any way to check the first pack and the last pack of image bytes data? So that I can combine it all and make the image. – Pak Ho Cheung Mar 10 '22 at 05:57
@PakHoCheung You can send each chunk of data with an integer, say 0 to 9. Once all the chunks are received you can stack them according to the integers. – Red Mar 10 '22 at 19:01
1

I'm using raspivid to do streaming, so I can't set each chunk of data with an integer – Pak Ho Cheung Mar 11 '22 at 12:08
@PakHoCheung I mean like sending a tuple of two elements each time; the first element being an integer and the second element being the sliced data. – Red Mar 11 '22 at 13:45
I just found out that the streaming is sending raw data, so my old method is not working. Thanks for your suggestion – Pak Ho Cheung Mar 11 '22 at 19:43
@PakHoCheung You're welcome. Could the reason that your received image is larger than it should be is because the last chunk of data *(or part of it)* is for the next frame? – Red Mar 11 '22 at 20:21

Capture first image from h.264 video streaming using websocket - Python

3 Answers3

Linked