play-services-vision: How do I sync the Face Detection speed with the Camera Preview speed?

Question

I have some code that allows me to detect faces in a live camera preview and draw a few GIFs over their landmarks using the play-services-vision library provided by Google.

It works well enough when the face is static, but when the face moves at a moderate speed, the face detector takes longer than the camera's framerate to detect the landmarks at the face's new position. I know it might have something to do with the bitmap draw speed, but I took steps to minimize the lag in them.

(Basically I get complaints that the GIFs' repositioning isn't 'smooth enough')

EDIT: I did try getting the coordinate detection code...

    List<Landmark> landmarksList = face.getLandmarks();
    for(int i = 0; i < landmarksList.size(); i++)
    {
        Landmark current = landmarksList.get(i);
        //canvas.drawCircle(translateX(current.getPosition().x), translateY(current.getPosition().y), FACE_POSITION_RADIUS, mFacePositionPaint);
        //canvas.drawCircle(current.getPosition().x, current.getPosition().y, FACE_POSITION_RADIUS, mFacePositionPaint);
        if(current.getType() == Landmark.LEFT_EYE)
        {
            //Log.i("current_landmark", "l_eye");
            leftEyeX = translateX(current.getPosition().x);
            leftEyeY = translateY(current.getPosition().y);
        }
        if(current.getType() == Landmark.RIGHT_EYE)
        {
            //Log.i("current_landmark", "r_eye");
            rightEyeX = translateX(current.getPosition().x);
            rightEyeY = translateY(current.getPosition().y);
        }
        if(current.getType() == Landmark.NOSE_BASE)
        {
            //Log.i("current_landmark", "n_base");
            noseBaseY = translateY(current.getPosition().y);
            noseBaseX = translateX(current.getPosition().x);
        }
        if(current.getType() == Landmark.BOTTOM_MOUTH) {
            botMouthY = translateY(current.getPosition().y);
            botMouthX = translateX(current.getPosition().x);
            //Log.i("current_landmark", "b_mouth "+translateX(current.getPosition().x)+" "+translateY(current.getPosition().y));
        }
        if(current.getType() == Landmark.LEFT_MOUTH) {
            leftMouthY = translateY(current.getPosition().y);
            leftMouthX = translateX(current.getPosition().x);
            //Log.i("current_landmark", "l_mouth "+translateX(current.getPosition().x)+" "+translateY(current.getPosition().y));
        }
        if(current.getType() == Landmark.RIGHT_MOUTH) {
            rightMouthY = translateY(current.getPosition().y);
            rightMouthX = translateX(current.getPosition().x);
            //Log.i("current_landmark", "l_mouth "+translateX(current.getPosition().x)+" "+translateY(current.getPosition().y));
        }
    }
    eyeDistance = (float)Math.sqrt(Math.pow((double) Math.abs(rightEyeX - leftEyeX), 2) + Math.pow(Math.abs(rightEyeY - leftEyeY), 2));
    eyeCenterX = (rightEyeX + leftEyeX) / 2;
    eyeCenterY = (rightEyeY + leftEyeY) / 2;
    noseToMouthDist = (float)Math.sqrt(Math.pow((double)Math.abs(leftMouthX - noseBaseX), 2) + Math.pow(Math.abs(leftMouthY - noseBaseY), 2));

...in a separate thread within the View draw method, but it just nets me a SIGSEGV error.

My questions:

Is syncing the Face Detector's processing speed with the Camera Preview framerate the right thing to do in this case, or is it the other way around, or is it some other way?
As the Face Detector finds the faces in a camera preview frame, should I drop the frames that the preview feeds before the FD finishes? If so, how can I do it?
Should I just use setClassificationMode(NO_CLASSIFICATIONS) and setTrackingEnabled(false) in a camera preview just to make the detection faster?
Does the play-services-vision library use OpenCV, and which is actually better?

EDIT 2:

I read one research paper that, using OpenCV, the face detection and other functions available in OpenCV is faster in Android due to their higher processing power. I was wondering whether I can leverage that to hasten the face detection.

Can you guys please help in this issue? https://stackoverflow.com/questions/45141098/google-vision-drawing-mask-on-face-with-animations — Aijaz Ali Gopang, Jul 18 '17 at 07:43

score 0 · Answer 1 · edited May 23 '17 at 12:24

There is no way you can guarantee that face detection will be fast enough to show no visible delay even when the head motion is moderate. Even if you succeed to optimize the hell of it on your development device, you will sure find another model among thousands out there, that will be too slow.

Your code should be resilient to such situations. You can predict the face position a second ahead, assuming that it moves smoothly. If the users decide to twitch their head or device, no algorithm can help.

If you use the deprecated Camera API, you should pre-allocate a buffer and use setPreviewCallbackWithBuffer(). This way you can guarantee that the frames arrive to you image processor one at a time. You should also not forget to open the Camera on a background thread, so that the [onPreviewFrame()](http://developer.android.com/reference/android/hardware/Camera.PreviewCallback.html#onPreviewFrame(byte[], android.hardware.Camera)) callback, where your heavy image processing takes place, will not block the UI thread.

Yes, OpenCV face-detection may be faster in some cases, but more importantly it is more robust that the Google face detector.
Yes, it's better to turn the classificator off if you don't care about smiles and open eyes. The performance gain may vary.
I believe that turning tracking off will only slow the Google face detector down, but you should make your own measurements, and choose the best strategy.
The most significant gain can be achieved by turning setProminentFaceOnly() on, but again I cannot predict the actual effect of this setting for your device.

I'm working on the location prediction thing you mentioned in the second paragraph. How does it work? Is it as simple as getting the difference in X and Y coordinates between the current face position and the previous iteration's face position, multiplying them by 2, and adding them to the current face's X and Y coordinates? — Gensoukyou1337, May 13 '16 at 03:41
I would probably try parabolic extrapolation: given 3 last positions, fit them on a curve and calculate x,y for some moments in the future, at the rate of 30 FPS — Alex Cohn, May 13 '16 at 03:50
I don't know this lib closely. The important point is to generate expected positions at higher rate than you calculate the actual ones. — Alex Cohn, May 13 '16 at 14:17

pm0733464 · Answer 2 · 2016-05-12T14:49:09.483

There's always going to be some lag, since any face detector takes some amount of time to run. By the time you draw the result, you will usually be drawing it over a future frame in which the face may have moved a bit.

Here are some suggestions for minimizing lag:

The CameraSource implementation provided by Google's vision library automatically handles dropping preview frames when needed so that it can keep up the best that it can. See the open source version of this code if you'd like to incorporate a similar approach into your app: https://github.com/googlesamples/android-vision/blob/master/visionSamples/barcode-reader/app/src/main/java/com/google/android/gms/samples/vision/barcodereader/ui/camera/CameraSource.java#L1144

Using a lower camera preview resolution, such as 320x240, will make face detection faster.

If you're only tracking one face, using the setProminentFaceOnly() option will make face detection faster. Using this and LargestFaceFocusingProcessor as well will make this even faster.

To use LargestFaceFocusingProcessor, set it as the processor of the face detector. For example:

Tracker<Face> tracker = *your face tracker implementation*
detector.setProcessor(
    new LargestFaceFocusingProcessor.Builder(detector, tracker).build());

Your tracker implementation will receive face updates for only the largest face that it initially finds. In addition, it will signal back to the detector that it only needs to track that face for as long as it is visible.

If you don't need to detect smaller faces, using the setMinFaceSize() larger will make face detection faster. It's faster to detect only larger faces, since it doesn't need to spend time looking for smaller faces.

You can turn of classification if you don't need eyes open or smile indication. However, this would only give you a small speed advantage.

Using the tracking option will make this faster as well, but at some accuracy expense. This uses a predictive algorithm for some intermediate frames, to avoid the expense running full face detection on every frame.

How exactly do I build the LargestFaceFocusingProcessor? – Gensoukyou1337 May 12 '16 at 07:07 — Gensoukyou1337, May 12 '16 at 07:07
I added some more on LargestFaceFocusingProcessor above. – pm0733464 May 12 '16 at 14:50 — pm0733464, May 12 '16 at 14:50

play-services-vision: How do I sync the Face Detection speed with the Camera Preview speed?

2 Answers2