1

I have a huge dataset of images having some logos at arbitrary places on white paper. How to retrieve coordinates (top left and bottom right) of object from the image using python?

For ex, consider this image http://ak9.picdn.net/shutterstock/videos/5360279/thumb/3.jpg (ignore shadow) I want to highlight egg in the image.

EDIT: Images are hi-res & very huge in count so iterative solution takes a good amount of time. One thing i missed is that images are stored in 1-bit mode. So i think we can get better solution using numpy.

user2578525
  • 191
  • 1
  • 11
  • Don't post links. Please add the picture to the question itself. And provide a minimal working example to show us what you have tried so far. – buhtz Apr 10 '18 at 09:49
  • This could help - https://stackoverflow.com/questions/32531377/ – Divakar Apr 10 '18 at 10:59

2 Answers2

1

If the rest of the picture is one colour you can compare each pixel and find a different colour indicating the start of the picture like this please pay attention that I assume the top right hand corner to be the background colour, if this is not always the case, use a different approach (counting mode pixel colour for instance)!:

import numpy as np
from PIL import Image
import pprint

def get_y_top(pix, width, height, background, difference):
    back_np = np.array(background)
    for y in range(0, height):
        for x in range(0, width):
            if max(np.abs(np.array(pix[x, y]) - back_np)) > difference:
                return y

def get_y_bot(pix, width, height, background, difference):
    back_np = np.array(background)
    for y in range(height-1, -1,  -1):
        for x in range(0, width):
            if max(np.abs(np.array(pix[x, y]) - back_np)) > difference:
                return y

def get_x_left(pix, width, height, background, difference):
    back_np = np.array(background)
    for x in range(0, width):
        for y in range(0, height):
            if max(np.abs(np.array(pix[x, y]) - back_np)) > difference:
                return x

def get_x_right(pix, width, height, background, difference):
    back_np = np.array(background)
    for x in range(width-1, -1, -1):
        for y in range(0, height):
            if max(np.abs(np.array(pix[x, y]) - back_np)) > difference:
                return x

img = Image.open('test.jpg')
width, height = img.size
pix = img.load()
background = pix[0,0]

difference = 20 #or whatever works for you here, use trial and error to establish this number
y_top = get_y_top(pix, width, height, background, difference)
y_bot = get_y_bot(pix, width, height, background, difference)
x_left = get_x_left(pix, width, height, background, difference)
x_right = get_x_right(pix, width, height, background, difference)

Using this information you can crop your image and save:

img = img.crop((x_left,y_top,x_right,y_bot))
img.save('test3.jpg')

Resulting in this: enter image description here

Nathan
  • 3,558
  • 1
  • 18
  • 38
  • This solution works fine but my images are very high res so it takes considerable amount of time. Please see my edit – user2578525 Apr 10 '18 at 10:52
  • @user2578525 If you know the approximate size of your image you can probably skip a bunch of pixels and still get pretty good results. If you skip 2 out of 3 pixels this cuts down the time with a factor 9 (3x faster in x and 3x faster in y) – Nathan Apr 10 '18 at 11:00
0

For this image(the egg on the white bg): enter image description here

Your can crop in the following steps:

  1. Read and convert to gray
  2. Threshold and Invert
  3. Find the extreme coordinates and crop

The egg image, size of (480, 852, 3), costs 0.016s.


The code:

## Time passed: 0.016 s

#!/usr/bin/python3
# 2018/04/10 19:39:14
# 2018/04/10 20:25:36 
import cv2
import numpy as np
import matplotlib.pyplot as plt

import time
ts = time.time()

## 1. Read and convert to gray
fname = "egg.jpg"
img = cv2.imread(fname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

##  2. Threshold and Invert
th, dst = cv2.threshold(gray, 240, 255, cv2.THRESH_BINARY_INV)

##  3. Find the extreme coordinates and crop 
ys, xs = np.where(dst>0)
target = img[ys.min():ys.max(), xs.min():xs.max()]

te = time.time()
print("Time passed: {:.3f} s".format(te-ts))
plt.imshow(target)
plt.show()

## Time passed: 0.016 s

enter image description here

Kinght 金
  • 17,681
  • 4
  • 60
  • 74