Apple Vision – Is it possible to obtain camera position from static image?

Question

Let's say I have a single photo (taken with iOS camera) that contains a known image target (e.g. a square QR code that is 5cm x 5cm) lying on a flat plane. can I use the Apple Vision framework to calculate the 6dof pose of the image target?

I'm unfamiliar with the framework, but it seems to me that this problem is similar to the tracking of AR Targets, and so I'm hoping that there is a solution in there somewhere!

In fact what I actually want to do is to detect shapes in the static image (using an existing cloud hosted open cv app) and to display those shapes in AR using ARKit. I was hoping that I could have the same image targets present in both the static images and in the AR video feed.

Andy Jazz · Answer 1 · 2023-01-10T10:20:58.827

Obtaining ARCamera position

In ARKit you can acquire ARCamera's position thru ARFrame's dot notation. Each ARFrame (out of 60 frames per second) contains 4x4 camera matrix. To update ARCamera's position use an instance method called renderer(_:didUpdate:for:).

Here's "initial" method called renderer(_:didAdd:for:):

extension ViewController: ARSCNViewDelegate {

    func renderer(_ renderer: SCNSceneRenderer, 
                 didAdd node: SCNNode, 
                  for anchor: ARAnchor) {
    
        let frame = sceneView.session.currentFrame
    
        print(frame?.camera.transform.columns.3.x as Any)
        print(frame?.camera.transform.columns.3.y as Any)
        print(frame?.camera.transform.columns.3.z as Any)

        // ...
     }
}

Obtaining anchor coordinates and image size

When you're using Vision and ARKit together, the simplest way to obtain coordinates of a tracked image in ARKit is to use a transform instance property of ARImageAnchor expressed in SIMD 4x4 matrix.

var transform: simd_float4x4 { get }

This matrix encoding the position, orientation, and scale of the anchor relative to the world coordinate space of the AR session the anchor is placed in.

Here's how your code may look like:

extension ViewController: ARSCNViewDelegate {

    func renderer(_ renderer: SCNSceneRenderer, 
                 didAdd node: SCNNode, 
                  for anchor: ARAnchor) {
    
        guard let imageAnchor = anchor as? ARImageAnchor
        else { return }
    
        print(imageAnchor.transform.columns.3.x)
        print(imageAnchor.transform.columns.3.y)
        print(imageAnchor.transform.columns.3.z)

        // ...
     }
}

If you want to know what a SIMD 4x4 matrix is, read this post.

Also, for obtaining a physical size (in meters) of a tracked photo use this property:

// set in Xcode's `AR Resources` Group
imageAnchor.referenceImage.physicalSize

To calculate a factor between the initial size and the estimated physical size use this property:

imageAnchor.estimatedScaleFactor

Updating anchor coordinates and image size

To constantly update coordinates of ARImageAnchor and image size use second method coming from ARSCNViewDelegate :

optional func renderer(_ renderer: SCNSceneRenderer, 
                   didUpdate node: SCNNode, 
                       for anchor: ARAnchor)

For obtaining a bounding box (CGRect type) of your photo in Vision use this instance property:

VNDetectedObjectObservation().boundingBox

Thanks, but can I obtain the position of a tracked image from a single photograph?? — Adrian Taylor, Jun 30 '20 at 05:01
Yes, you can. You are able to obtain tracked image's position from a single photo. — Andy Jazz, Nov 17 '20 at 18:01

Apple Vision – Is it possible to obtain camera position from static image?

1 Answers1

Obtaining ARCamera position

Obtaining anchor coordinates and image size

Updating anchor coordinates and image size