Swift: What matrix should be used to convert 3D point to 2D in ARKit/Scenekit

Question

I am trying to use ARCamera matrix to do the conversion of 3D point to 2D in ARkit/Scenekit. Previously, I used projectpoint to get the projected x and y coordinate which is working fine. However, the app is significantly slowed down and would crash for appending long recordings.

So I turn into another approach: using the ARCamera parameter to do the conversion on my own. The Apple document for projectionMatrix did not give much. So I looked into the theory about projection matrix The Perspective and Orthographic Projection Matrix and Metal Tutorial. From my understanding that when we have a 3D points P=(x,y,z), in theory we should be able to just get the 2D point like so: P'(2D)=P(3D)*projectionMatrix.

I am assuming that's would be the case, so I did:

    func session(_ session: ARSession, didUpdate frame: ARFrame) {
          guard let arCamera = session.currentFrame?.camera else { return }
       //intrinsics: a matrix that converts between the 2D camera plane and 3D world coordinate space.
        //projectionMatrix: a transform matrix appropriate for rendering 3D content to match the image captured by the camera.
        print("ARCamera ProjectionMatrix = \(arCamera.projectionMatrix)")
        print("ARCamera Intrinsics = \(arCamera.intrinsics)")

    }

I am able to get the projection matrix and intrinsics (I even tired to get intrinsics to see whether it changes) but they are all the same for each frame.

ARCamera ProjectionMatrix = simd_float4x4([[1.774035, 0.0, 0.0, 0.0], [0.0, 2.36538, 0.0, 0.0], [-0.0011034012, 0.00073593855, -0.99999976, -1.0], [0.0, 0.0, -0.0009999998, 0.0]])
ARCamera Intrinsics = simd_float3x3([[1277.3052, 0.0, 0.0], [0.0, 1277.3052, 0.0], [720.29443, 539.8974, 1.0]])...

I am not too sure I understand what is happening here as I am expecting that the projection matrix will be different for each frame. Can someone explain the theory here with projection matrix in scenekit/ARKit and validate my approach? Am I using the right matrix or do I miss something here in the code?

Thank you so much in advance!

score 1 · Answer 1 · answered Aug 08 '20 at 19:48

You'd likely need to use the camera's transform matrix as well, as this is what changes between frames as the user moves around the real world camera and the virtual camera's transform is updated to best match that. Composing that together with the projection matrix should allow you to get into screen space.

Swift: What matrix should be used to convert 3D point to 2D in ARKit/Scenekit

1 Answers1