Delaying frames of an Android virtual display

Question

The basic issue I am trying to solve is to delay what is sent to a virtual display by a second or so. So basically, I am trying to shift all frames by 1 second after the initial recording. Note that a surface is used as an input and another surface is used as an output through this virtual display. My initial hunch is to explore a few ideas, given that modification of the Android framework or use of non-public APIs is fine. Java or native C/C++ is fine.

a) I tried delaying frames posted to the virtual display or output surface by a second or two in SurfaceFlinger. This does not work as it causes all surfaces to be delayed by the same amount of time (synchronous processing of frames).

b) MediaCodec uses a surface as an input to encode, and then produce the decoded data. Is there anyway to use MediaCodec such that it does not actually encode and only produce unencoded raw frames? Seems unlikely. Moreover, how does MediaCodec do this under the hood? Process things frame by frame. If I can extrapolate the method I might be able to extract frame by frame from my input surface and create a ring buffer delayed by the amount of time I require.

c) How do software decoders, such as FFmpeg, actually do this in Android? I assume they take in a surface but how would they extrapolate and process frame by frame

Note that I can certainly encode and decode to retrieve the frames and post them but I want to avoid actually decoding. Note that modifying the Android framework or using non-public APIs is fine.

I also found this: Getting a frame from SurfaceView

It seems like option d) could be using a SurfaceTexture but I would like to avoid the process of encoding/decoding.

This is a really hard question. Have you seen [this](http://bigflake.com/) ? Or experimented with the media projection API? I know you can get unencoded raw frames from screenrecord using the "--format raw" option. — Srini, Mar 22 '16 at 22:28
Just to be clear: you're getting a continuous series of frames from somewhere (unspecified), then writing them to a virtual display you created, and you want to add a one second lag between when you receive the frame and when you render it onto the virtual display? Sort of like the various Grafika camera demos, substituting VirtualDisplay for SurfaceView, but with a delay? — fadden, Mar 22 '16 at 22:57
Yes, so the virtual display is recording layers and displaying them into the surface I provide. I need to shift the presentation time by, say, 1 second. Note that I am not adding 1 second delays between every frame, just shifting the presentation by 1 second. Haven't looked into Grafika so not sure. It seems I can leverage the code from screenrecord (Srinivas' link)? Or is there a better or more straightforward way? — John Smith, Mar 23 '16 at 16:32
Sure you can do this. But you need to consider a simple calculation. Let's say we have a display of 2560 by 1600 pixels (some 10" tablet) and you absolutely don't want to encode the frames as they come in, but add a one second delay between their produced time and you consuming them. So you'll need a 2560 * 1600 * 4 bytes/px * 60 fps = 980 MB of RAM memory just to keep that one second worth of unencoded frames into the memory. Going forward with implementing, nothing stops you from messing around with your own OpenGL renderer and publishing every frame from your buffer one second later. — Adrian Crețu, Mar 23 '16 at 18:04
So the only way to do this is to process things frame by frame? Is there any other way to delay the initial viewing of the frame.. or perhaps an easier way, through SurfaceFlinger for example, to avoid using a ring buffer? — John Smith, Mar 24 '16 at 16:29

score 4 · Accepted Answer · answered Mar 24 '16 at 20:17

As I understand it, you have a virtual display that is sending its output to a Surface. If you just use a SurfaceView for output, frames output by the virtual display appear on the physical display immediately. The goal is to introduce one second of latency between when the virtual display generates a frame and when the Surface consumer receives it, so that (again using SurfaceView as an example) the physical display shows everything a second late.

The basic concept is easy enough: send the virtual display output to a SurfaceTexture, and save the frame into a circular buffer; meanwhile another thread is reading frames out of the tail end of the circular buffer and displaying them. The trouble with this is what @AdrianCrețu pointed out in the comments: one second of full-resolution screen data at 60fps will occupy a significant fraction of the device's memory. Not to mention that copying that much data around will be fairly expensive, and some devices might not be able to keep up.

(It doesn't matter whether you do it in the app or in SurfaceFlinger... the data for up to 60 screen-sized frames has to be held somewhere for a full second.)

You can reduce the volume of data in various ways:

Reduce the resolution. Scaling 2560x1600 to 1280x800 removes 3/4 of the pixels. The loss of quality should be difficult to notice on most displays, but it depends on what you're viewing.
Reduce the color depth. Switching from ARGB8888 to RGB565 will cut the size in half. This will be noticeable though.
Reduce the frame rate. You're generating the frames for the virtual display, so you can choose to update it more slowly. Animation is still reasonably smooth at 30fps, halving the memory requirements.
Apply image compression, e.g. PNG or JPEG. Fairly effective, but too slow without hardware support.
Encode inter-frame differences. If not much is changing from frame to frame, the incremental changes can be very small. Desktop-mirroring technologies like VNC do this. Somewhat slow to do in software.

A video codec like AVC will both compress frames and encode inter-frame differences. That's how you get 1GByte/sec down to 10Mbit/sec and still have it look pretty good.

Consider, for example, the "continuous capture" example in Grafika. It feeds the Camera output into a MediaCodec encoder, and stores the H.264-encoded output in a ring buffer. When you hit "capture", it saves the last 7 seconds. This could just as easily play the camera feed with a 7-second delay, and it only needs a few megabytes of memory to do it.

The "screenrecord" command can dump H.264 output or raw frames across the ADB connection, though in practice ADB is not fast enough to keep up with raw frames (even on tiny displays). It's not doing anything you can't do from an app (now that we have the mediaprojection API), so I wouldn't recommend using it as sample code.

If you haven't already, it may be useful to read through the graphics architecture doc.

Delaying frames of an Android virtual display

1 Answers1