Detect and analyze text using Amazon Textract from a multi page document PDF synchronously

Question

Further question is - will it affect the accuracy of text detection by Amazon Textract?

Do I need to pre-process the image to get better result from Amazon Textract?

Himanshu Gupta · Answer 1 · 2022-09-29T04:44:41.010

0

I converted PDF to PNG using command pdftoppm. In Python -> subprocess.Popen(['pdftoppm -png Sample.pdf Sample'])

The accuracy of Amazon Textract on PDF file was more than the PNG format. Because PDF is the original document.

edited Sep 29 '22 at 04:44

answered Jul 10 '20 at 10:24

Himanshu Gupta

What sort of inaccuracies did you see with PNG? Are more blocks detected in PDF? Did you see more inaccuracies in text with PNG files vs PDF? – Zaki Aziz Jun 16 '21 at 20:30
1

@Zakir the inaccuracy was with text in PNG. – Himanshu Gupta Aug 27 '21 at 08:52

1 Answers1