Why is the Vision framework unable to align two images?

Question

I'm trying to take two images using the camera, and align them using the iOS Vision framework:

func align(firstImage: CIImage, secondImage: CIImage) {
  let request = VNTranslationalImageRegistrationRequest(
      targetedCIImage: firstImage) {
    request, error in
    if error != nil {
      fatalError()
    }
    let observation = request.results!.first
        as! VNImageTranslationAlignmentObservation
    secondImage = secondImage.transformed(
        by: observation.alignmentTransform)
    let compositedImage = firstImage!.applyingFilter(
        "CIAdditionCompositing",
        parameters: ["inputBackgroundImage": secondImage])
    // Save the compositedImage to the photo library.
  }

  try! visionHandler.perform([request], on: secondImage)
}

let visionHandler = VNSequenceRequestHandler()

But this produces grossly mis-aligned images:

You can see that I've tried three different types of scenes — a close-up subject, an indoor scene, and an outdoor scene. I tried more outdoor scenes, and the result is the same in almost every one of them.

I was expecting a slight misalignment at worst, but not such a complete misalignment. What is going wrong?

I'm not passing the orientation of the images into the Vision framework, but that shouldn't be a problem for aligning images. It's a problem only for things like face detection, where a rotated face isn't detected as a face. In any case, the output images have the correct orientation, so orientation is not the problem.

My compositing code is working correctly. It's only the Vision framework that's a problem. If I remove the calls to the Vision framework, put the phone of a tripod, the composition works perfectly. There's no misalignment. So the problem is the Vision framework.

This is on iPhone X.

How do I get Vision framework to work correctly? Can I tell it to use gyroscope, accelerometer and compass data to improve the alignment?

I'd be curious to see how the program aligns a subset/portion of a picture to the whole picture. Are the alignments stochastic (does the output vary despite the input being the same)? Alignment programs often make approximations/simplifications to reduce computation time. Stochastic programming is a way to compensate. — Ghoti, Mar 12 '18 at 22:26

score 0 · Answer 1 · answered Sep 26 '20 at 22:07

0

You should set secondImage as targetImage, and perform handler with firstImage.

I use your composite way.

answered Sep 26 '20 at 22:07

user38155

61
3

score 0 · Answer 2 · answered Sep 18 '21 at 20:38

check out this example from MLBoy:

let request = VNTranslationalImageRegistrationRequest(targetedCIImage: image2, options: [:])

let handler = VNImageRequestHandler(ciImage: image1, options: [:])
do {
try handler.perform([request])
} catch let error {
print(error)
}

guard let observation = request.results?.first as? VNImageTranslationAlignmentObservation else { return }
let alignmentTransform = observation.alignmentTransform

image2 = image2.transformed(by: alignmentTransform)
let compositedImage = image1.applyingFilter("CIAdditionCompositing",　parameters: ["inputBackgroundImage": image2])

Why is the Vision framework unable to align two images?

2 Answers2