I have some sample images. How to extract tabular data from images and store it into JSON format?
Asked
Active
Viewed 846 times
0
-
1Google python OCR libraries. – Alex Hall Aug 29 '19 at 05:29
-
pytesseract.image_to_string(img, lang='eng') – Zhubei Federer Aug 29 '19 at 05:59
1 Answers
1
Use pytesseract. The code will be something like this. You can try different modifications . My code may not solve the whole problem .It is just an example code ,this will work for text in black but for blue and any other colour you will have to create a mask accordingly and then extract that data.
import pytesseract
from PIL import Image, ImageEnhance, ImageFilter
im = Image.open("temp.jpg")
maxsize = (2024, 2024)
im=im.thumbnail(maxsize, PIL.Image.ANTIALIAS)
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save('mod_file.jpg')
text = pytesseract.image_to_string(Image.open('mod_file.jpg'))
print(text)
For example for red colour detection you can refer to this post. After getting the red text binarize the image and then run
text = pytesseract.image_to_string(Image.open('red_text_file.jpg'))
Similerly you will have to do the same process for blue and so on. I believe you can easily try to do it yorself, just play around with some values.

Andy_101
- 1,246
- 10
- 20
-
Thanks for the answer but I am not getting the exact data. char. short int int lone int float double 8 long double 12 | MEANING LESS ah AH = void – Saurabh Kumar Aug 29 '19 at 11:56