iOS11 ARKit: Can ARKit also capture the Texture of the user's face?

Question

I read the whole documentation on all ARKit classes up and down. I don't see any place that describes ability to actually get the user face's Texture.

ARFaceAnchor contains the ARFaceGeometry (topology and geometry comprised of vertices) and the BlendShapeLocation array (coordinates allowing manipulations of individual facial traits by manipulating geometric math on the user face's vertices).

But where can I get the actual Texture of the user's face. For example: the actual skin tone / color / texture, facial hair, other unique traits, such as scars or birth marks? Or is this not possible at all?

score 10 · Accepted Answer · answered Nov 11 '17 at 00:53

10

You want a texture-map-style image for the face? There’s no API that gets you exactly that, but all the information you need is there:

ARFrame.capturedImage gets you the camera image.
ARFaceGeometry gets you a 3D mesh of the face.
ARAnchor and ARCamera together tell you where the face is in relation to the camera, and how the camera relates to the image pixels.

So it’s entirely possible to texture the face model using the current video frame image. For each vertex in the mesh...

Convert the vertex position from model space to camera space (use the anchor’s transform)
Multiply with the camera projection with that vector to get to normalized image coordinates
Divide by image width/height to get pixel coordinates

This gets you texture coordinates for each vertex, which you can then use to texture the mesh using the camera image. You could do this math either all at once to replace the texture coordinate buffer ARFaceGeometry provides, or do it in shader code on the GPU during rendering. (If you’re rendering using SceneKit / ARSCNView you can probably do this in a shader modifier for the geometry entry point.)

If instead you want to know for each pixel in the camera image what part of the face geometry it corresponds to, it’s a bit harder. You can’t just reverse the above math because you’re missing a depth value for each pixel... but if you don’t need to map every pixel, SceneKit hit testing is an easy way to get geometry for individual pixels.

If what you’re actually asking for is landmark recognition — e.g. where in the camera image are the eyes, nose, beard, etc — there’s no API in ARKit for that. The Vision framework might help.

answered Nov 11 '17 at 00:53

rickster

124,678
26
272
326

Thank you very much for the great answer. I'm following the approach of manipulating the texture coordinate buffer from ARFaceGeometry, and it's looking promising. – FranticRock Nov 14 '17 at 21:50
@rickster Can I assume (with the "all at once" technique above) that rather than attempting to manipulate the camera image to fit the texture coordinates, one should leave the camera image alone, and manipulate the texture coords in the .obj file instead ? – coco Nov 30 '17 at 15:17
@coco What obj file? `ARFaceGeometry` provides a new face mesh, with vertex positions updated to match the current pose/expression of the face, *on every frame*. So “all at once” is “all at once *per frame*”; that is, each time you get a new anchor with updated geometry, you run through its vertex buffer and generate a new texture coordinates buffer mapping each vertex to the point in the video image *currently* “behind” that vertex. – rickster Nov 30 '17 at 18:18
In neither of my suggested approaches do you manipulate the image — it’s all about manipulating texture coordinates (using vertex position data) so that your texture sample into the image gets you pixels matching where the face mesh currently is. “All at once” means processing the whole vertex buffer (likely on CPU); the alternative is to do it on the GPU *during* render time, since messing with vertex attributes (like position/texcoord) is exactly what vertex shaders are for. – rickster Nov 30 '17 at 18:22
Thank you @rickster. In my exploration of this, I'm first working on a single frame, which is why I'm exporting the data as an .obj file, to more easily view the result. – coco Nov 30 '17 at 19:44
How do you compute the RGBA color information from each pixel in ARFrame.capturedImage (CVPixelBuffer) using just the CPU (without using Metal GPU)? – Luis B Apr 02 '18 at 00:44
@LuisB SceneKit session has a `snapshot` property that is an `UIImage` in rgb instead of YUV – Juan Boero Aug 28 '19 at 21:24
@rickster i want to know for each pixel in the camera image what part of the face geometry it corresponds, can you elaborate on that one? i also asked the question [here](https://stackoverflow.com/q/59145124/1634890) – Juan Boero Dec 18 '19 at 16:50

Matt Bierner · Answer 2 · 2021-04-12T07:08:41.043

7

I've put together a demo iOS app that shows how to accomplish this. The demo captures a face texture map in realtime, applying it back to a ARSCNFaceGeometry to create a textured 3D model of the user's face.

Below you can see the realtime textured 3D face model in the top left, overlaid on top of the AR front facing camera view:

The demo works by rendering an ARSCNFaceGeometry, however instead of rendering it normally, you instead render it in texture space while continuing to use the original vertex positions to determine where to sample from in the captured pixel data.

Here are links to the relevant parts of the implementation:

FaceTextureGenerator.swift — The main class for generating face textures. This sets up a Metal render pipeline to generate the texture.
faceTexture.metal — The vertex and fragment shaders used to generate the face texture. These operate in texture space.

Almost all the work is done in a metal render pass, so it easily runs in realtime.

I've also put together some notes covering the limitations of the demo

If you instead want a 2D image of the user's face, you can try doing the following:

Render the transformed ARSCNFaceGeometry to a 1-bit buffer to create an image mask. Basically you just want places where the face model appears to be white, while everything else should be black.
Apply the mask to the captured frame image.

This should give you an image with just the face (although you will likely need to crop the result)

edited Apr 12 '21 at 07:08

answered Apr 12 '21 at 06:57

Matt Bierner

58,117
21
175
206

Hi Matt, It's a great example, But if we want same thing with ARMesh(introduced in ARKit 3.5) instead of Face, Can we achieve it? if yes, can you please explain a little how can we do it? – Ali Aeman Apr 12 '21 at 07:50
1

The core logic should be the same, you'd just update `FaceTextureGenerator.swift` to pull in geometry from an ARMesh that tracks the users's face instead of pulling from a SceneKit geometry – Matt Bierner Apr 12 '21 at 20:54
@MattBierner could you please tell us the difference between your project and the Apple demo project of Augmented Faces? Also the idea was to get the texture out to the UVW map of the face so it can map back and forth, thanks! – Juan Boero Apr 13 '21 at 18:45
The apple video face texture demo draws the face geometry with a fragment shader that samples from the captured image (its more like a post processing effect). My example generates a new texture map for the face geometry every frame. You can then apply this texture back to a separate face geometry that can be transformed independently of the user's actual face – Matt Bierner Apr 13 '21 at 20:19
@MattBierner I have the same question as Ali Aeman. Can you provide a sample to do that? – Vidhya Sri Apr 15 '21 at 06:29
@MattBierner ARMeshAnchor doesn't have any uv .texcoord information – masaldana2 Apr 19 '21 at 22:48

score 3 · Answer 3 · edited Nov 12 '18 at 03:11

You can calculate the texture coordinates as follows:

let geometry = faceAnchor.geometry
let vertices = geometry.vertices
let size = arFrame.camera.imageResolution
let camera = arFrame.camera

modelMatrix = faceAnchor.transform

let textureCoordinates = vertices.map { vertex -> vector_float2 in
    let vertex4 = vector_float4(vertex.x, vertex.y, vertex.z, 1)
    let world_vertex4 = simd_mul(modelMatrix!, vertex4)
    let world_vector3 = simd_float3(x: world_vertex4.x, y: world_vertex4.y, z: world_vertex4.z)
    let pt = camera.projectPoint(world_vector3,
        orientation: .portrait,
        viewportSize: CGSize(
            width: CGFloat(size.height),
            height: CGFloat(size.width)))
    let v = 1.0 - Float(pt.x) / Float(size.height)
    let u = Float(pt.y) / Float(size.width)
    return vector_float2(u, v)
}

So, with this i can know where does each image pixel maps to the texture coordinate for that moment? lets say i want to create a diffuse map sampling the face, for each pixel paint the diffuse map — Juan Boero, Dec 18 '19 at 16:44
I have applied this to SCNGeometory Texture coordinates Source. But it produces weird result.. any idea? — MJ Studio, Feb 19 '20 at 00:48

iOS11 ARKit: Can ARKit also capture the Texture of the user's face?

3 Answers3

Linked