2

I need to stream the screen of my Windows PC to Android. I intend to use FFmpeg to capture the screen and encode using H.264 codec, send the stream through RTP and finally use MediaCodec to decode the video and display it on a SurfaceView.

I tried the following FFmpeg command:

ffmpeg -f gdigrab -i desktop -an -video_size 1920x1080 -f rtp rtp://192.168.0.12:23000

However, all the NAL units that result seem to be corrupted, because:

  1. The forbidden_zero_bit (most significant bit) of the NAL unit header is 1. For example, header of the NAL unit shown below (the byte right after 0x00 0x00 0x01) is 0xB6, so clearly the most significant bit is equal to 1.

  2. A lot of bytes in the NAL unit are equal to 0xFF. I don't actually know if they are supposed to be like this, they just seem weird to me.

This is the beginning of one of the NAL units outputted by FFmpeg, captured with Wireshark:

0000   00 00 01 b6 56 5a bc 7c fd de ea e7 72 ff ff ff
0010   ff ff ff ef 7d d7 ff bd 6f 5f ff ee d7 ba bf ff
0020   fd df bd 7b a5 ff ff ff ff ff fd d7 78 bf fd e2
0030   ff ff ff ff ff ff 7b fe eb ff ff ff ff ff ff ff
0040   fe f5 ff ff ff ff fd b4 c6 17 45 ba 7e f4 e9 fb
0050   d7 ef 7f de ff ff ff ff fd d7 ff 79 ff bc ff ff
0060   ff ff ff ff ff ba ff ff ff ff ff ff ff 7b ff f7
0070   27 ff ff ff de ff ff ff ff ff ff ff fe ef fd c7
0080   de ef 6f 7b db dd db 74 de dd 37 bd ef ff ff ff
0090   ff ff ff ff 77 bb ff 75 ee ee bf ff ff fb dd df
00a0   ee d7 79 5e 5f ff ff ff fb 9b ff fb d7 ff ff ff
00b0   de bf ff ff ff ff ff ff ff ff fb 9d ef bd df 00
00c0   00 8f 03 ef ff ff ff ff ff ff ff 7b f7 03 1f fd
00d0   ed e5 ba ef 5d d5 cc 5f ff ff ff ff ff ff ff ff
00e0   ff ff ff ee 06 37 be f4 f6 eb ff ff ff ff ff ff
00f0   ff ff ff ff ff ff ff ba 5f f7 af ff ff ff ff ff
0100   ff ff ff ff ff ff ff ff fd d3 fb c2 ef 1b dd ed
...
...
...

Screenshot from Wireshark (same NAL unit)

I also tried specifying the video codec explicitly in FFmpeg, like this:

ffmpeg -f gdigrab -i desktop -an -vcodec libx264 -f rtp rtp://192.168.0.12:23000

In this case, I don't get Annex B style NAL units, but AVCC style ones (without the 0x00 0x00 0x01 separators, but preceded by their length, as described here).

With AVCC NAL units I don't really understand where one ends and another begins, and also where that "extradata" mentioned in the question linked above is.

In summary, what I want to know is as follows:

  1. Why are the NAL units outputted by the first command corrupted?

  2. From what I understand (from here), you have to feed separate NAL units to MediaCodec for decoding. So, how do I separate NAL units in AVCC format from one another?

  3. Can I somehow force FFmpeg to output Annex B style NAL units instead of AVCC ones while specifying the video codec as libx264?

  4. Is there a more straightforward way of capturing the screen on Windows, encoding, sending the stream to the Android device and displaying the video in my app? (maybe a library or an API that is escaping my notice)

  • Have you had any success? I'm currently fighting the exact same battle. When I feed the full frame data, I do get an output, but MediaCodec is buffering around 15 frames before outputing to the surface, which means that I'm getting huge latency. When feeding NAL units after I cut the frame data and feed data in between 0x00 0x00 0x01 sequences, I get corrupted output. – Hey'Youssef May 10 '20 at 20:38
  • @Hey'Youssef Not yet. I'll get back on it tomorrow. Are you using ffmpeg too for capturing the screen of your Windows machine? If so, can you please share the exact command you used? – UrsulAerodinamic May 11 '20 at 13:28

1 Answers1

0

1) Why are the NAL units outputted by the first command corrupted?

There not getting corrupted. There is information in an RTP packet other that raw h264 data. That information may contain the byte sequence 00 00 01 and it does not signal a NALU follows.

2) From what I understand (from here), you have to feed separate NAL units to MediaCodec for decoding. So, how do I separate NAL units in AVCC format from one another?

You parse the stream, including the protocol overhead

3) Can I somehow force FFmpeg to output Annex B style NAL units instead of AVCC ones while specifying the video codec as libx264?

That won't not conform to the RTP specification. If you use RTP, you get AVCC start lengthsize values.

4) Is there a more straightforward way of capturing the screen on Windows, encoding, sending the stream to the Android device and displaying the video in my app? (maybe a library or an API that is escaping my notice)

This is a pretty opinionated question. RTP is pretty straight forward. But there are may other choices, each with pros and cons.

szatmary
  • 29,969
  • 8
  • 44
  • 57
  • Actually, emulation prevention bytes are used to make sure `00 00 01` can only be found preceding a NAL unit (in Annex B style units). And they are most certainly getting corrupted, because, as I explained in my question, if the forbidden_zero_bit is equal to 1, that means errors have been introduced to the NAL unit. A really good explanation can be found here: [https://yumichan.net/video-processing/video-compression/introduction-to-h264-nal-unit/](https://yumichan.net/video-processing/video-compression/introduction-to-h264-nal-unit/) – UrsulAerodinamic May 11 '20 at 13:46
  • `You parse the stream, including the protocol overhead` Do you mean I should pass the RTP packets (including their header) directly to MediaCodec? – UrsulAerodinamic May 11 '20 at 13:50