2

In CGImagePropertyOrientation,

When the user captures a photo while holding the device in portrait orientation, iOS writes an orientation value of CGImagePropertyOrientation.right in the resulting image file.

In the sample code from Object Tracking in Vision (WWDC 2018), using front camera

func exifOrientationForDeviceOrientation(_ deviceOrientation: UIDeviceOrientation = UIDevice.current.orientation) -> CGImagePropertyOrientation {
    switch deviceOrientation {
       case .portraitUpsideDown:
           return .rightMirrored
       case .landscapeLeft:
           return .downMirrored
       case .landscapeRight:
           return .upMirrored
       default:
           return .leftMirrored
    }
}

What's the relationship between device orientation and exif orientation according to the position of the camera?

Willjay
  • 6,381
  • 4
  • 33
  • 58

3 Answers3

1

I think this topic warrants in depth investigation. No matter how many times I deal with this, I still got it all wrong and solving it by trial and error. Here are

(1) According to sample code in Recognizing Objects in Live Capture https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture

the definition is:

public func exifOrientationFromDeviceOrientation() -> CGImagePropertyOrientation {
    let curDeviceOrientation = UIDevice.current.orientation
    let exifOrientation: CGImagePropertyOrientation

    switch curDeviceOrientation {
    case UIDeviceOrientation.portraitUpsideDown:  // Device oriented vertically, home button on the top
        exifOrientation = .left
    case UIDeviceOrientation.landscapeLeft:       // Device oriented horizontally, home button on the right
        exifOrientation = .upMirrored
    case UIDeviceOrientation.landscapeRight:      // Device oriented horizontally, home button on the left
        exifOrientation = .down
    case UIDeviceOrientation.portrait:            // Device oriented vertically, home button on the bottom
        exifOrientation = .up
    default:
        exifOrientation = .up
    }
    return exifOrientation
}

This looks a bit different than your post. So just saying that this file defined their relationship is probably not going to generalize, there must be a deeper explanation that help better understanding.

(2) In your target deployment info, there's a section for "Device Orientation". If I checked "Landscape Left" and hold it in this supported orientation, my runtime debug running the above exifOrientationFromDeviceOrientation will give you a .down, which means it was UIDeviceOrientation.landscapeRight?!? I just don't get why the contradiction, and I didn't have time to dig and have to move on.

(3) There's yet another orientation related attribute call AVCaptureVideoOrientation thats used when you set up video output orientation. And for the above case, I need to set it to landscapeRight, consistent with Device Orientation but opposite of target deployment info. This at least make some sense, the video orientation convention better be the same as uidevice orientation. However, this confuses the hell out of me during debugging. I preview the CVImageBuffer in captureOutput delegate and saw that it is upside down! But I guess conspiring together with the exifOrientationFromDeviceOrientation and everything just worked. Note: I deployed my own yolo v2 object detection net trained and built in keras (transformed with coremltools) and tried to draw bounding boxes on an iPad that I only want to work in one orientation (i think it will be another tedious task for me if the need to work in all orientations).

At the end of day, I really like to see better doc from apple, or some heroes come forward and explain all these in a blog. I just hope whatever I did would carry to other devices in the same supported orientation, for I don't own enough variety of apple-ware to test.

I may post the POC project up in git. I may come here and post the link, and you can check what I talked about here with the code itself.

kawingkelvin
  • 3,649
  • 2
  • 30
  • 50
1

The conversion is a function of the device orientation, as well as the camera position (front or back). The most accurate function I've found so far is this gist (or this other answer), which works great for the Vision framework. Here's a slightly modified version of the same gist retaining the same logic:

extension CGImagePropertyOrientation {
  init(isUsingFrontFacingCamera: Bool, deviceOrientation: UIDeviceOrientation = UIDevice.current.orientation) {
    switch deviceOrientation {
    case .portrait:
      self = .right
    case .portraitUpsideDown:
      self = .left
    case .landscapeLeft:
      self = isUsingFrontFacingCamera ? .down : .up
    case .landscapeRight:
      self = isUsingFrontFacingCamera ? .up : .down
    default:
      self = .right
    }
  }
}

I tried verifying the results using this method::

  1. Create a new project in Xcode 11.6

  2. Add NSCameraUsageDescription to info.plist.

  3. Replace ViewController.swift with the code below.

  4. Update devicePositionToTest to front/back depending on which one you want to test.

  5. Replace SEARCH STRING HERE with a piece of text you are going to scan.

  6. Run the app, and point it at the text, while changing orientations.

  7. You will make the following observations:

    • Back camera:
      • .portrait: .right and .up both work.
      • .landscapeRight: .down and .right.
      • .portraitUpsideDown: .left and .down.
      • .landscapeLeft: .up and .left.
    • Front camera:
      • .portrait: .right and .up.
      • .landscapeRight: .up and .left.
      • .portraitUpsideDown: .left and .down.
      • .landscapeLeft: .down and .right.

Notice how no matter what camera/device orientation, there are always two different orientations that will work. This is because when in the portrait + back camera orientation, left to right text is recognized normally (as you would expect), but also text flowing top to bottom will be recognized.

However, the first orientation listed above is more accurate than the second. You'll get a lot more junk data if you go with the second column on each of these. You can verify this by printing out the entire results of allStrings below.

Note that this was only tested for the vision framework. If you're using the sample buffer for something else, or have the camera configured differently, you may need a different conversion function.

import AVFoundation
import UIKit
import Vision

let devicePositionToTest = AVCaptureDevice.Position.back
let expectedString = "SEARCH STRING HERE"

class ViewController: UIViewController {

  let captureSession = AVCaptureSession()

  override func viewDidLoad() {
    super.viewDidLoad()

    // 1. Set up input
    let device = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: devicePositionToTest)!
    if device.isFocusModeSupported(.continuousAutoFocus) {
      try! device.lockForConfiguration()
      device.focusMode = .continuousAutoFocus
      device.unlockForConfiguration()
    }
    let input = try! AVCaptureDeviceInput(device: device)
    captureSession.addInput(input)

    // 2. Set up output
    let output = AVCaptureVideoDataOutput()
    output.alwaysDiscardsLateVideoFrames = true
    output.setSampleBufferDelegate(self, queue: DispatchQueue(label: "com.example"))
    captureSession.addOutput(output)

    // 3. Set up connection
    let connection = output.connection(with: .video)!
    assert(connection.isCameraIntrinsicMatrixDeliverySupported)
    connection.isCameraIntrinsicMatrixDeliveryEnabled = true

    let previewView = CaptureVideoPreviewView(frame: CGRect(x: 0, y: 0, width: 400, height: 400))
    previewView.videoPreviewLayer.videoGravity = .resizeAspect
    previewView.videoPreviewLayer.session = captureSession

    view.addSubview(previewView)

    captureSession.startRunning()
  }
}

extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
  func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

    let cameraIntrinsicData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil)!
    let options: [VNImageOption: Any] = [.cameraIntrinsics: cameraIntrinsicData]

    let allCGImageOrientations: [CGImagePropertyOrientation] = [.up, .upMirrored, .down, .downMirrored, .leftMirrored, .right, .rightMirrored, .left]
    allCGImageOrientations.forEach { orientation in
      let imageRequestHandler = VNImageRequestHandler(
        cvPixelBuffer: pixelBuffer,
        orientation: orientation,
        options: options)

      let request = VNRecognizeTextRequest { value, error in
        let observations = value.results as! [VNRecognizedTextObservation]
        let allStrings = observations.compactMap { $0.topCandidates(1).first?.string.lowercased() }.joined(separator: " ")
        if allStrings.contains(expectedString) {
          // FOUND MATCH. deviceOrientation: @UIDevice.current.orientation@. exifOrientation: @orientation@.
          print("FOUND MATCH. deviceOrientation: \(UIDevice.current.orientation). exifOrientation: \(orientation)")
        }
      }
      request.recognitionLevel = .accurate
      request.usesLanguageCorrection = true

      try! imageRequestHandler.perform([request])
    }
  }
}

class CaptureVideoPreviewView: UIView {
  override class var layerClass: AnyClass {
    return AVCaptureVideoPreviewLayer.self
  }

  var videoPreviewLayer: AVCaptureVideoPreviewLayer {
    layer as! AVCaptureVideoPreviewLayer
  }
}

Senseful
  • 86,719
  • 67
  • 308
  • 465
0

But the relation is already defined in snippet you posted.

Camera in iPhone is placed so that the image is correctly oriented when phone is hold in one of the landscape modes.

Camera has no knowing of orientation and always returns image data as it is. Those image data are then wrapped in CGImage which still has no orientation but is wrapped in UIImage which has orientation.

Since it seems very wasteful to swap bytes to get correctly oriented image it is better to add orientation data from which a transform matrix may be made to correctly present the image. There is also a mirrored version which is I believe mostly used with front camera. When you open your camera app and try to take a selfie notice that what you are seeing will be mirrored compared to what you see on the taken photo. This is to simulate the mirror effect and the same logic is not applied to back camera.

Anyway, depending on the device orientation we need to rotate the received CGImage so it is presented correctly. In the system you have posted so when device is portrait the image should be rotated left and mirrored (no idea which comes first nor which way the mirror is done but it is described in documentation). Naturally the upside-down is then rotated right, and left or right are what is left; when phone is turned to landscape toward right (I assume clockwise) the image setting is set to be as received by the camera but mirrored.

I am not sure why mirrored is used or why (if what you say is correct) when in portrait iOS uses property right while exif uses left but it should depend on how these values are defined. One system may say right means the image is rotated clockwise (CW) and needs to be rotated counter-clockwise (CCW) when presenting. The other system may say right means that image should be rotated CW to be visualized correctly because the original is rotated CCW.

I hope this clears your questions.

Matic Oblak
  • 16,318
  • 3
  • 24
  • 43