0

I'm trying to get the orientation of text from an image. I have 8 type of images with different orientation, look at all type in the next image (I will put a link of a repository which you can get all images inputs):

plan types

I was using these lybraries to detect orientation of my text from an image.

import pytesseract as tess
from PIL import Image

my_image = Image.open(temp_image_path)
osd = tess.image_to_osd(my_image)
print(osd)

Output: this is what i got

>   Page number: 0
    Orientation in degrees: 270
    Rotate: 90
    Orientation confidence: 2.77
    Script: Cyrillic
    Script confidence: 2.88

however, I don't get why sometimes a vertical plan with a vertical text (type II from my image) has an output like this: Rotate: 90 or Rotate: 270.

I used opencv and tensorflow, they helped me to get similarities but not to identify if my text has a different orientation.

This is the Repository from github: Click Here to watch the repository with inputs

  • Look at the footer of the floor plan, where all the info is written. This hints a the possible orientation of each plan type if you use it in conjunction with the plan's aspect ratio. i.e., more height than width + footer at the left side -> type II plan. The challenge is to process the footer and make it a "feature". Maybe if you crop the plan, and reduce to a column and a row you can find the location of the footer, as that position will contain the most concentration of black pixels. You can also examine the number of blobs (and their location) via their bounding boxes for each floor plan. – stateMachine Jan 31 '23 at 02:04
  • You can even downsample (downsize) each floor plan by a considerable factor and it will retain the necessary footer info to carry out my suggestion above. – stateMachine Jan 31 '23 at 02:08
  • it's a good recommendation, however the challenge would be which part of the image will be crop if i don't know what type of 8 type is the image – Joel Barrantes Feb 01 '23 at 03:51
  • I have done similar projects a few years ago, the solution I use is train a classifier, it works – StereoMatching Feb 01 '23 at 10:18
  • @StereoMatching would you mind if you explain what you did in your project or share a repository with a simple example – Joel Barrantes Feb 01 '23 at 17:20
  • @JoelBarrantes It is a commercial project(around 2018?) so I cannot share it with you. It is just a dead simple convolution neural network for classification. You should be able to find similar codes from kaggle "cat vs dog" competitions. Another trick I use is "teacher and students" trick, because there are too many data do not label, so I first train a weak classifier, use the weak classier to classify unlabeled data, then retrain the classifier with bigger dataset(remember to do it with image augmentation), iterate until the classifier reach 95%+ accuracy on the golden test set. – StereoMatching Feb 08 '23 at 14:47
  • @JoelBarrantes Another solution, I think is contours. Use the findContours functions of opencv to split the contours out, judge by their geometry relationship. I recommend you give contours a shot first before jump into training a classifier. – StereoMatching Feb 08 '23 at 14:54
  • @StereoMatching my repository with a simple example is there :-) hope that help you to get me – Joel Barrantes Feb 16 '23 at 04:00

1 Answers1

0

Following @stateMachine's recommendation, detecting the footer position and aspect ratio is a good idea. You can try to do so by detecting squares in the image. This should be fairly easy to do with OpenCV, see an example.

If you have some labeled images you can also try @StereoMatching idea. In this case using a very simple HOG descriptor as the image representation + a Suport Vector for the classification should do the trick. You can use OpenCV implementation of HOGD and sklearn SVC.

Let's assume you have a nice load() function for your (small) dataset, you can do something like that :

import cv2
from sklearn import svm
from sklearn.model_selection import cross_val_score

### Load the dataset
path_images, labels = load(dataset)

### HOGD Options ###
winSize = (112,112)
blockSize = (16,16)
blockStride = (8,8)
cellSize = (8,8)
nbins = 9
derivAperture = 1
winSigma = 4.
histogramNormType = 0
L2HysThreshold = 2.0000000000000001e-01
gammaCorrection = 0
nlevels = 64
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,
                        histogramNormType,L2HysThreshold,gammaCorrection,nlevels)

### Get the dataset representation
hogds = list(map(lambda p: hog.compute(cv2.imread(p)),path_images))


### Get a sense of the performance
clf = svm.SVC(class_weight='balanced')
print(cross_val_score(clf, hogds, labels, cv=5))

maxew
  • 26
  • 5