I need to stream the screen of my Windows PC to Android. I intend to use FFmpeg to capture the screen and encode using H.264 codec, send the stream through RTP and finally use MediaCodec to decode the video and display it on a SurfaceView.
I tried the following FFmpeg command:
ffmpeg -f gdigrab -i desktop -an -video_size 1920x1080 -f rtp rtp://192.168.0.12:23000
However, all the NAL units that result seem to be corrupted, because:
The forbidden_zero_bit (most significant bit) of the NAL unit header is 1. For example, header of the NAL unit shown below (the byte right after 0x00 0x00 0x01) is 0xB6, so clearly the most significant bit is equal to 1.
A lot of bytes in the NAL unit are equal to 0xFF. I don't actually know if they are supposed to be like this, they just seem weird to me.
This is the beginning of one of the NAL units outputted by FFmpeg, captured with Wireshark:
0000 00 00 01 b6 56 5a bc 7c fd de ea e7 72 ff ff ff
0010 ff ff ff ef 7d d7 ff bd 6f 5f ff ee d7 ba bf ff
0020 fd df bd 7b a5 ff ff ff ff ff fd d7 78 bf fd e2
0030 ff ff ff ff ff ff 7b fe eb ff ff ff ff ff ff ff
0040 fe f5 ff ff ff ff fd b4 c6 17 45 ba 7e f4 e9 fb
0050 d7 ef 7f de ff ff ff ff fd d7 ff 79 ff bc ff ff
0060 ff ff ff ff ff ba ff ff ff ff ff ff ff 7b ff f7
0070 27 ff ff ff de ff ff ff ff ff ff ff fe ef fd c7
0080 de ef 6f 7b db dd db 74 de dd 37 bd ef ff ff ff
0090 ff ff ff ff 77 bb ff 75 ee ee bf ff ff fb dd df
00a0 ee d7 79 5e 5f ff ff ff fb 9b ff fb d7 ff ff ff
00b0 de bf ff ff ff ff ff ff ff ff fb 9d ef bd df 00
00c0 00 8f 03 ef ff ff ff ff ff ff ff 7b f7 03 1f fd
00d0 ed e5 ba ef 5d d5 cc 5f ff ff ff ff ff ff ff ff
00e0 ff ff ff ee 06 37 be f4 f6 eb ff ff ff ff ff ff
00f0 ff ff ff ff ff ff ff ba 5f f7 af ff ff ff ff ff
0100 ff ff ff ff ff ff ff ff fd d3 fb c2 ef 1b dd ed
...
...
...
Screenshot from Wireshark (same NAL unit)
I also tried specifying the video codec explicitly in FFmpeg, like this:
ffmpeg -f gdigrab -i desktop -an -vcodec libx264 -f rtp rtp://192.168.0.12:23000
In this case, I don't get Annex B style NAL units, but AVCC style ones (without the 0x00 0x00 0x01 separators, but preceded by their length, as described here).
With AVCC NAL units I don't really understand where one ends and another begins, and also where that "extradata" mentioned in the question linked above is.
In summary, what I want to know is as follows:
Why are the NAL units outputted by the first command corrupted?
From what I understand (from here), you have to feed separate NAL units to MediaCodec for decoding. So, how do I separate NAL units in AVCC format from one another?
Can I somehow force FFmpeg to output Annex B style NAL units instead of AVCC ones while specifying the video codec as libx264?
Is there a more straightforward way of capturing the screen on Windows, encoding, sending the stream to the Android device and displaying the video in my app? (maybe a library or an API that is escaping my notice)