0

I have some sample images. How to extract tabular data from images and store it into JSON format?

image 1

Community
  • 1
  • 1
Saurabh Kumar
  • 129
  • 1
  • 8

1 Answers1

1

Use pytesseract. The code will be something like this. You can try different modifications . My code may not solve the whole problem .It is just an example code ,this will work for text in black but for blue and any other colour you will have to create a mask accordingly and then extract that data.

import pytesseract
from PIL import Image, ImageEnhance, ImageFilter

im = Image.open("temp.jpg")

maxsize = (2024, 2024)
im=im.thumbnail(maxsize, PIL.Image.ANTIALIAS) 

im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)

im = enhancer.enhance(2)
im = im.convert('1')

im.save('mod_file.jpg')
text = pytesseract.image_to_string(Image.open('mod_file.jpg'))
print(text)

For example for red colour detection you can refer to this post. After getting the red text binarize the image and then run

text = pytesseract.image_to_string(Image.open('red_text_file.jpg'))

Similerly you will have to do the same process for blue and so on. I believe you can easily try to do it yorself, just play around with some values.

Andy_101
  • 1,246
  • 10
  • 20
  • Thanks for the answer but I am not getting the exact data. char. short int int lone int float double 8 long double 12 | MEANING LESS ah AH = void – Saurabh Kumar Aug 29 '19 at 11:56