Detect a object using camera and position a 3D object using ARKit in iOS

Question

What am I looking for?

A simple explanation of my requirement is this

Using ARKit, detect an object using iPhone camera
Find the position of this object on this virtual space
Place a 3D object on this virtual space using SceneKit. The 3D object should be behind the marker.

An example would be to detect a small image/marker position in a 3D space using camera, place another 3D ball model behind this marker in virtual space (so the ball will be hidden from the user because the marker/image is in front)

What I am able to do so far?

I am able to detect a marker/image using ARKit
I am able to position a ball 3D model on the screen.

What is my problem?

I am unable to position the ball in such a way that ball is behind the marker that is detected.

When the ball is in front the marker, the ball correctly hide the marker. You can see in the side view that ball is in front of the marker. See below

But when the ball is behind the marker, opposite doesn't happen. The ball is always seeing in front blocking the marker. I expected the marker to hide the ball. So the scene is not respecting the z depth of the ball's position. See below

Code

Please look into the comments as well

override func viewDidLoad() {
    super.viewDidLoad()

    sceneView.delegate = self
    sceneView.autoenablesDefaultLighting = true

    //This loads my 3d model.
    let ballScene = SCNScene(named: "art.scnassets/ball.scn")
    ballNode = ballScene?.rootNode

    //The model I have is too big. Scaling it here.
    ballNode?.scale = SCNVector3Make(0.1, 0.1, 0.1)
}

override func viewWillAppear(_ animated: Bool) {
    super.viewWillAppear(animated)

    //I am trying to detect a marker/image. So ImageTracking configuration is enough
    let configuration = ARImageTrackingConfiguration()

    //Load the image/marker and set it as tracking image
    //There is only one image in this set
    if let trackingImages = ARReferenceImage.referenceImages(inGroupNamed: "Markers",
                              bundle: Bundle.main) {
        configuration.trackingImages = trackingImages
        configuration.maximumNumberOfTrackedImages = 1
    }

    sceneView.session.run(configuration)
}

override func viewWillDisappear(_ animated: Bool) {
    super.viewWillDisappear(animated)
    sceneView.session.pause()
}


func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {
    let node = SCNNode()

    if anchor is ARImageAnchor {
        //my image is detected
        if let ballNode = self.ballNode {

            //for some reason changing the y position translate the ball in z direction
            //Positive y value moves it towards the screen (infront the marker)
            ballNode.position = SCNVector3(0.0, -0.02, 0.0)

            //Negative y value moves it away from the screen (behind the marker)
            ballNode.position = SCNVector3(0.0, -0.02, 0.0)
            node.addChildNode(ballNode)
        }
    }

    return node
}

How to make the scene to respect the z position? Or in other words, how to show a 3D model behind an image/marker that has been detected using ARKit framework?

I am running against iOS 12, using Xcode 10.3. Let me know if any other information is needed.

score 2 · Answer 1 · answered Nov 22 '19 at 17:03

To achieve that you need to create an occluder in the 3D scene. Since an ARReferenceImage has a physicalSize it should be straightforward to add a geometry in the scene when the ARImageAnchor is created.

The geometry would be a SCNPlane with a SCNMaterial appropriate for an occluder. I would opt for a SCNLightingModelConstant lighting model (it's the cheapest and we won't actually draw the plane) with a colorBufferWriteMask equal to SCNColorMaskNone. The object should be transparent but still write in the depth buffer (that's how it will act as an occluder).

Finally, make sure that the occluder is rendered before any augmented object by setting its renderingOrder to -1 (or an even lower value if the app already uses rendering orders).

Andy Jazz · Answer 2 · 2022-11-24T22:28:27.640

In ARKit 3.0 Apple engineers implemented ZDepth compositing technique called People Occlusion. This feature is available only on devices with A12 and A13 'cause it's highly processor intensive. At the moment ARKit ZDepth compositing feature is in its infancy, hence it allows you only composite people over and under (or people-like objects) background, not any other object seen via rear camera. And, I think, you know about front TrueDepth camera – it's for face tracking and it has additional IR sensor for this task.

To turn ZDepth compositing feature on, use these instance properties in ARKit 3.0:

var frameSemantics: ARConfiguration.FrameSemantics { get set }

static var personSegmentationWithDepth: ARConfiguration.FrameSemantics { get }

Real code should look like this:

let config = ARWorldTrackingConfiguration()

if let config = mySession.configuration as? ARWorldTrackingConfiguration {
    config.frameSemantics.insert(.personSegmentationWithDepth)
    mySession.run(config)
}

After alpha channel's segmentation a formula for every channel computation looks like this:

r = Az > Bz ? Ar : Br
g = Az > Bz ? Ag : Bg
b = Az > Bz ? Ab : Bb
a = Az > Bz ? Aa : Ba

where Az is a ZDepth channel of Foreground image (3D model)
Bz is ZDepth a channel of Background image (2D video)
Ar, Ag, Ab, Aa – Red, Green, Blue and Alpha channels of 3D model
Br, Bg, Bb, Ba – Red, Green, Blue and Alpha channels of 2D video

But in early versions of ARKit there's no ZDepth compositing feature, so you can composite a 3D model over 2D background video only using standard 4-channel compositing OVER operation:

(Argb * Aa) + (Brgb * (1 - Aa))

where Argb is RGB channels of Foreground A image (3D model)
Aa is an Alpha channel of Foreground A image (3D model)
Brgb is RGB channels of Background B image (2D video)
(1 - Aa) is an inversion of Foreground Alpha channel

As a result, without `personSegmentationWithDepth` property your 3D model will always be `OVER` a 2D video.

Thus, if object on a Video doesn't look like humans' hand or like a human body, when using regular ARKit tools, you can't place the object from 2D video over 3D model.

.....

Nonetheless, you can do it using Metal and AVFoundation frameworks. Consider – it's not easy.

To extract ZDepth data from video stream you need the following instance property:

// Works from iOS 11
var capturedDepthData: AVDepthData? { get }

Or you may use these two instance methods (remember ZDepth channel must be 32-bit):

// Works from iOS 13
func generateDilatedDepth(from frame: ARFrame, 
                       commandBuffer: MTLCommandBuffer) -> MTLTexture

func generateMatte(from frame: ARFrame, 
                commandBuffer: MTLCommandBuffer) -> MTLTexture

Please read this SO post if you wanna know how to do it using Metal.

For additional information, please read this SO post.

Detect a object using camera and position a 3D object using ARKit in iOS

What am I looking for?

What I am able to do so far?

What is my problem?

Code

2 Answers2

As a result, without `personSegmentationWithDepth` property your 3D model will always be `OVER` a 2D video.

Linked

Detect a object using camera and position a 3D object using ARKit in iOS

What am I looking for?

What I am able to do so far?

What is my problem?

Code

2 Answers2

As a result, without personSegmentationWithDepth property your 3D model will always be OVER a 2D video.

Linked

As a result, without `personSegmentationWithDepth` property your 3D model will always be `OVER` a 2D video.