Got a Tabular data in an image format (see pic1)
The tabular data need to be extracted and saved it in CSV format (same as table)
I have used pytesseract to read the data from an image and it partially worked code:
from PIL import Image
from ast import literal_eval
import pytesseract,csv,re,os
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
result = pytesseract.image_to_string(Image.open(r'D:\Sample.jpg'),lang="eng")
#print(type(result))
print(result)
with open('D:\people.csv', 'w') as outfile:
writer = csv.writer(outfile)
#writer.replace(",", "")
writer.writerow(result)
string = open('D:\people.csv').read()
new_str = re.sub('[^a-zA-Z0-9\n\.]', ' ', string)
open('D:\people.csv', 'w').write(new_str)
output:
The output file is opened in text format and I am not able to get the perfect csv format (i.e. like the table in image).
Any help would be appreciated. TIA