Using CoreML's VNDetectTextRectanglesRequest, you only able to find regions of visible text in an image. And, thats not enough to get text out from an image with swift.
First step is to crop the images, You will need to crop the images for each image in VNTextObservation. Like
for textObservation in textObservations {
guard let rects = textObservation.characterBoxes else {
continue
}
var xMin = CGFloat.greatestFiniteMagnitude
var xMax: CGFloat = 0
var yMin = CGFloat.greatestFiniteMagnitude
var yMax: CGFloat = 0
for rect in rects {
xMin = min(xMin, rect.bottomLeft.x)
xMax = max(xMax, rect.bottomRight.x)
yMin = min(yMin, rect.bottomRight.y)
yMax = max(yMax, rect.topRight.y)
}
let imageRect = CGRect(x: xMin * size.width, y: yMin * size.height, width: (xMax - xMin) * size.width, height: (yMax - yMin) * size.height)
Second step is to send images to image processing tools like Opencv etc,.there are some online tutorials about how to integrate with iOS and you can use objective-c header if you want to use it with swift. https://medium.com/pharos-production/using-opencv-in-a-swift-project-679868e1b798
Once you got processed image, Third step is As mentioned by Nick,
you then use tesseract or ABBYY SDK's.
Tesseract is free to use and you can find iOS framework for tesseract 3.03-rc1 here. The most important thing you need to aware about OCR tools is language. What language you try to convert ? what language the detected image has ? Mostly you got trained data for multiple languages in the tesseract repository. In Summary, the work flow will be ,
Image Capture -> Image Process -> OCR process