1

I'm developing an application that receives H264 encoded data over RTP and I am having troubles getting Android's MediaCodec to output anything. I'm depacketizing the RTP packets as described here https://stackoverflow.com/a/7668578/10788248

After encoded frames have been reassembled I feed them into the dequeued input buffers.

I don't get any errors when I queue the input buffers, but the method onOutputBufferAvailable is never called by the decoder's callback.

The only way I'm able to get it called is by passing the end-of-stream flag, and then the output size is 0.

My questions are, 'is there anything obviously wrong that I'm missing?' and 'what potential issues could lead to output buffers never becoming available, but the codec not throwing an error?'

Code Updated

LinkedList<DatagramPacket> packets = new LinkedList<>();
Thread socketReader = new Thread(() -> {
        try {
            DatagramSocket socket  = new DatagramSocket(videoPort);
            socket.connect(InetAddress.getByName(remoteAddress),remotePort);
            byte[] b;
            while (true){
                b = new byte[1500];
                DatagramPacket p = new DatagramPacket(b,1500);
                socket.receive(p);
                //Log.d(TAG,"RTP: "+bytesToHex(p.getData()));
                packets.add(p);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    });
@SuppressLint({"NewApi", "LocalSuppress"}) Thread packetHandler = new Thread(() -> {
        try {
            MediaCodec decoder = MediaCodec.createDecoderByType(MIME_TYPE);
            MediaFormat decoderFormat = MediaFormat.createVideoFormat(MIME_TYPE, 1280, 720);
            LinkedList<ByteBuffer> decoderInputs = new LinkedList<>();
            LinkedList<Integer> decoderIndices = new LinkedList<>();

            decoder.setCallback(new MediaCodec.Callback() {
                @Override
                public void onInputBufferAvailable(@NonNull MediaCodec codec, int index) {
                    decoderInputs.add(codec.getInputBuffer(index));
                    decoderIndices.add(new Integer(index));
                }

                @Override
                public void onOutputBufferAvailable(@NonNull MediaCodec codec, int index, @NonNull MediaCodec.BufferInfo info) {
                    Log.d(TAG,"OUTPUT AVAILABLE!!!!");
                    Log.d(TAG, String.valueOf(info.size));
                }

                @Override
                public void onError(@NonNull MediaCodec codec, @NonNull MediaCodec.CodecException e) {
                    Log.e(TAG,e.toString());
                }
                @Override
                public void onOutputFormatChanged(@NonNull MediaCodec codec, @NonNull MediaFormat format) {
                    Log.d(TAG,"FORMAT CHANGED");
                }
            });
            decoder.configure(decoderFormat, null,null,0);
            decoder.start();
            RTPFrame frame = new RTPFrame();
            boolean sps = false;
            boolean pps = false;
            while (true){
                if(decoderIndices.peek()!=null){
                    DatagramPacket packet = packets.poll();
                    if(packet!=null){
                        byte[] data = packet.getData();
                        if(frame==null||(!frame.isInitialized())){
                            frame = new RTPFrame(data);
                        }
                        else{
                            if(frame.isComplete()){
                                Integer index = null;
                                ByteBuffer decoderInput = null;
                                try {
                                    decoderInput = decoderInputs.poll().put(frame.getFrame());
                                    index = decoderIndices.poll();
                                    int size = frame.getFrameSize();
                                    if (frame.SPS) {
                                        Log.d(TAG,"Depacketized at "+Integer.toUnsignedString(frame.getTime())+" length = "+frame.getFrameSize()+": " + bytesToHex(frame.getFrame().array()));
                                        decoder.queueInputBuffer(index, 0, size, frame.getTime(), MediaCodec.BUFFER_FLAG_CODEC_CONFIG);
                                        sps = true;
                                        pps = false;
                                    } else if (frame.PPS) {
                                        if(sps){
                                            Log.d(TAG,"Depacketized at "+Integer.toUnsignedString(frame.getTime())+" length = "+frame.getFrameSize()+": " + bytesToHex(frame.getFrame().array()));
                                            decoder.queueInputBuffer(index, 0, size, frame.getTime(), MediaCodec.BUFFER_FLAG_CODEC_CONFIG);
                                            pps = true;
                                        }
                                    } else if (!frame.badFrame) {
                                        if(sps&&pps){
                                            Log.d(TAG,"Depacketized at "+Integer.toUnsignedString(frame.getTime())+" length = "+frame.getFrameSize()+": " + bytesToHex(frame.getFrame().array()));
                                            decoder.queueInputBuffer(index, 0, size, frame.getTime(), 0);
                                        }
                                    }
                                    else{
                                        throw new RuntimeException();
                                    }
                                }
                                catch(Exception e){
                                    e.printStackTrace();
                                    if(index!=null){
                                        decoderIndices.push(index);
                                    }
                                    decoderInput.clear();
                                    decoderInputs.push(decoderInput);
                                }
                                frame = new RTPFrame();
                            }
                            else{
                                frame.addNALUnit(data);
                            }
                        }
                        if(frame.isComplete()){
                            Integer index = null;
                            ByteBuffer decoderInput = null;
                            try {
                                decoderInput = decoderInputs.poll().put(frame.getFrame());
                                index = decoderIndices.poll();
                                int size = frame.getFrameSize();
                                if (frame.SPS) {
                                    Log.d(TAG,"Depacketized at "+Integer.toUnsignedString(frame.getTime())+" length = "+frame.getFrameSize()+": " + bytesToHex(frame.getFrame().array()));
                                    decoder.queueInputBuffer(index, 0, size, frame.getTime(), MediaCodec.BUFFER_FLAG_CODEC_CONFIG);
                                    sps = true;
                                    pps = false;
                                } else if (frame.PPS) {
                                    if(sps){
                                        Log.d(TAG,"Depacketized at "+Integer.toUnsignedString(frame.getTime())+" length = "+frame.getFrameSize()+": " + bytesToHex(frame.getFrame().array()));
                                        decoder.queueInputBuffer(index, 0, size, frame.getTime(), MediaCodec.BUFFER_FLAG_CODEC_CONFIG);
                                        pps = true;
                                    }
                                } else if (!frame.badFrame) {
                                    if(sps&&pps){
                                        Log.d(TAG,"Depacketized at "+Integer.toUnsignedString(frame.getTime())+" length = "+frame.getFrameSize()+": " + bytesToHex(frame.getFrame().array()));
                                        decoder.queueInputBuffer(index, 0, size, frame.getTime(), 0);
                                    }
                                }
                                else{
                                    throw new Exception("bad frame");
                                }
                            }
                            catch(Exception e){
                                e.printStackTrace();
                                if(index!=null){
                                    decoderIndices.push(index);
                                }
                                decoderInput.clear();
                                decoderInputs.push(decoderInput);
                            }
                            frame = new RTPFrame();
                        }
                    }
                }
            }
        } catch (Exception e) {e.printStackTrace();}
    });

Here's a sample of all the NAL units I received for 1 frame, and how I depacketized them.

RTP: 80 63 00 2D 27 0E 64 30 66 B4 BA 42 3C 81 E0 00 80 6F...

RTP: 80 63 00 2E 27 0E 64 30 66 B4 BA 42 3C 01 F3 0F E0 3F...

RTP: 80 63 00 2F 27 0E 64 30 66 B4 BA 42 3C 01 37 9B FA BA...

RTP: 80 63 00 30 27 0E 64 30 66 B4 BA 42 3C 01 7F EA 75 7C...

RTP: 80 63 00 31 27 0E 64 30 66 B4 BA 42 3C 01 FA D8 A9 FF...

RTP: 80 63 00 32 27 0E 64 30 66 B4 BA 42 3C 01 1B C5 BC C0...

RTP: 80 63 00 33 27 0E 64 30 66 B4 BA 42 3C 01 0F F4 9A DE...

RTP: 80 63 00 34 27 0E 64 30 66 B4 BA 42 3C 01 F4 35 CD 28...

RTP: 80 63 00 35 27 0E 64 30 66 B4 BA 42 3C 01 9E 45 70 13...

RTP: 80 E3 00 36 27 0E 64 30 66 B4 BA 42 3C 41 0F 18 0D 83...

D/RTPreader: Depacketized at 2985639104 length = 12611: 00 00 00 01 21 E0 00 80 6F F0 B4 24 CD 5F 45 80 79 6E 0C...

Here's a sample SPS and PPS before and after depacketizing

RTP: 80 63 00 00 5F DF A1 70 4F 2F 8A 3E 27 42 00 1F 8D 68 05 00 5B A1 00 00 03 00 01 00 00 03 00 1E 0F 10 7A 80

Depacketized at 1608491376 length = 27: 00 00 01 27 42 00 1F 8D 68 05 00 5B A1 00 00 03 00 01 00 00 03 00 1E 0F 10 7A 80

Depacketized at 1608491376 length = 7: 00 00 01 28 CE 32 48

RTP: 80 63 00 01 5F DF A1 70 4F 2F 8A 3E 28 CE 32 48

Community
  • 1
  • 1
Anthony A
  • 55
  • 6
  • 1
    Probably simply because of bad input data. That is, the NALU you put into the input buffers. I don't think `onError()` gets called in the case of bad input data. Not sure what your `RTPFrame` class is doing to each frame, but in my experience, you need the NALU headers. If you post some examples of your `decoderInput` contents (first dozen bytes) then I'll compare them to what I have in my own code. – greeble31 Jan 29 '19 at 01:06
  • 1
    I don't see you putty any data in the input buffer. After accessing `ByteBuffer decoderInput = decoderInputs.poll();` you immediatly call `decoderInput = frame.getFrame();` Instead you should probably use ` ` `decoderInput.put(fragme.getFrame());` Also, i don't understand your SPS and PPS setting and checking. Note that theoretically these can change during streaming – ChrisBe Jan 29 '19 at 09:35
  • @greeble31 I updated my post with a sample of NAL units I'm receiving from my socket and the frame once I've reconstructed the NAL header and concatenated the video data. Am I understanding you correctly in that you don't concatenate NAL's like that? The link I provided above suggests that the input to a MediaCodec decoder is an entire encoded frame; is it wrong? Should I just be feeding in single NAL units? – Anthony A Jan 29 '19 at 14:53
  • 1
    @AnthonyAngrimson No, I misspoke on that, I think you're depacketizing correctly. What about ChrisBe's observation? – greeble31 Jan 29 '19 at 15:17
  • @ChrisBe The SPS and PPS checks are simple there so I can apply the config flag when queuing the buffers (although I believe it does not actually matter.) As for the buffer assignment, I've changed it to 'decoderInput=decoderInputs.poll().put(frame.getFrame());' but there is no change in behavior. – Anthony A Jan 29 '19 at 15:45
  • 1
    @AnthonyAngrimson I agree with @greeble31 that no output usually means malformed or missing data/CSD. So what does `frame.SPS` and `frame.PPS` do? Are `SPS` and `PPS` NALU formatted? Note that `BUFFER_FLAG_CODEC_CONFIG` always matters. Are you updating `SPS` and `PPS` or are you certain that they never change. How many frames are you feeding? Why is the `if (frame.isComplete())` block called twice, or have i got your code wrong? I also suggest that you use `Pair` with only one queue so you never mix up index and corresponding buffer. – ChrisBe Jan 29 '19 at 16:25
  • @ChrisBe My RTPFrame class determines whether or not the data passed to is for an sps or pps frame and that is what is represented by SPS and PPS; they're simple boolean values. I do receive new sps's and pps's occasionally, and I've already refactored my code to feed those into the decoder, but it shouldn't impact the decoding of those frames immediately following the first parameter sets. I have an indefinite amount of frames, they're coming from a webcam. Could the presentation time be the problem? The time difference between frames is roughly 3000. – Anthony A Jan 29 '19 at 16:39
  • @ChrisBe Also, 'if (frame.isComplete())' is called twice because for sps and pps frames, frame will be complete after passing it the initial nal. Every other frame will need additional nal's passed to it, and those are checked by the second if statement. I do intend on handling NAL's out of order, but I'm trying to get the codec to work first and it hasn't been a big problem so far, since my development network quality is reliable. – Anthony A Jan 29 '19 at 17:00
  • 1
    Please update your code to reflect what you are actually running right now, and add a hex dump of a sample SPS & PPS `decoderInput`. – greeble31 Jan 29 '19 at 17:22
  • 1
    @AnthonyAngrimson Ah, now i get it. Presentation time is of no importance as is is only passed through. Also, I'm not sure if you need it but flipping the bytebuffer after populating it never hurts. Can you post full SPS and PPS NALUs? – ChrisBe Jan 29 '19 at 17:23
  • @ChrisBe done. I've also updated the code. – Anthony A Jan 29 '19 at 18:03
  • @greeble31 done. – Anthony A Jan 29 '19 at 18:08
  • 1
    Here's a sample CODEC_CONFIG array from my own code: [00 00 00 01 67 42 C0 29 8D 68 05 00 5B A0 1E 11 08 D4 00 00 00 01 68 CE 01 A8 35 C8]. Three differences I can see right away: 4-byte NALU start codes, it's one array instead of two, and the `nal_ref_idc` is 3 instead of 1. That means your 0x27/0x28 (4th element of each array) should actually be 0x67/0x68. Not sure how important that is. – greeble31 Jan 29 '19 at 20:02
  • 1
    Also, what's your MIME_TYPE? (just checkin') – greeble31 Jan 29 '19 at 20:03
  • 1
    In fact, my `nal_ref_idcs` are 3 on my video frames, too. And another thing -- the sample frame in your question has a `nal_unit_type` of 1, not 5. That means it's not an IDR (or "key") frame. Don't expect any output until you send a key frame. – greeble31 Jan 29 '19 at 20:08
  • @greeble31 Thanks. I believe both 0x00000001 and 0x000001 are acceptable for the prefix. The nal_ref_idc refers to transport priority; as long as it's non-zero, I don't believe it matters for decoding. I know that the example I gave was a non-IDR frame and I know not to expect video without them. MIME_TYPE is "video/avc". My gut says it shouldn't matter if the SPS and PPS are concatenated or not, but I'll give it a try. Something else I noticed is that onOutputFormatChanged is not called when I pass new SPS/PPS's. I wonder if I'm still not feeding data into the buffers properly... – Anthony A Jan 29 '19 at 20:23
  • 1
    I suspect you're right on all counts except possibly `onOutputFormatChanged()` behavior. If you wanted to verify that, you could plug in the SPS/PPS from my comment above. – greeble31 Jan 29 '19 at 23:08
  • I'm not having any luck; onOutputFormatChanged() is not called using @greeble31 's parameter set, and I can't even get MediaCodec to throw an error by passing junk data to it. I built a second RTPReader class that uses MediaCodec synchronously and I can't get an output buffer when calling dequeueOutputBuffer. I noticed that there are bytes at the beginning of each frame that increment in regular intervals. This resembles a DON, might this be the issue? Otherwise, I'll be opening up another thread to examine my RTPFrame class for issues, since that's the only other thing I can think of... – Anthony A Feb 01 '19 at 20:26
  • 1
    I wouldn't worry too much about the `onOutputFormatChanged()` thing or the lack of errors; sounds to me like expected behavior. Suggestion: To build up a little confidence, find yourself a known-good H.264 key frame, hard-code it into an array, and repeatedly submit it as input. If nothing pops out the other end, you'll know your problems are more fundamental than the RTP implementation. – greeble31 Feb 01 '19 at 21:33
  • @greeble31 I figured it out; the `position` of the bytebuffer returned by `getFrame()` was the same as `limit`, so `put()` was having no effect. I've posted an answer elaborating on it. – Anthony A Feb 04 '19 at 19:48

1 Answers1

2

I'd like to thank @greeble31 and @ChrisBe for their help in figuring out the issue. The problem was indeed with my RTPFrame class. Specifically, the method getFrame(), which iterated through a list of ByteBuffers containing NAL unit data and used put(ByteBuffer) to add them to bb. This incremented the position of bb until it was equal to limit.

public ByteBuffer getFrame() {
    size = 4;
    int nalCount = nalUnits.size();
    for (int i = 0;i<nalCount;i++){
        size+=nalUnits.get(i).data.length;
    }

    ByteBuffer bb = ByteBuffer.allocate(size);
    bb.put(new byte[]{0,0,0,1});
    for(NALUnit unit:nalUnits){
        bb.put(unit.data);
    }
    return bb;
}

Changing return bb; to return (ByteBuffer)bb.position(0); fixed the issue.

Anthony A
  • 55
  • 6