I have collection of pdf files which stores information in below format:
Line no 1 Line no. 11
Line no 2 Line no. 12
. .
. .
. .
Line no 10 Line no N
I am using pdfplumber library to extract PDF's text content but, instead of reading from line 1 to 10 at first and then marching towards line 11 (and so on) pdfplumber
reads line 1 and line 11 together as a single line. Consider below output:
Line no 1 Line no. 11
Line no 2 Line no. 12
.
.
.
What I expect:
Line no. 1
Line no. 2
.
.
.
Line no. 11
.
.
.
Here is the link to the pdf which I am trying to read.
I tried extract_table()
utility from pdfplumber
library with table settings, but it didn't work (referred answer https://stackoverflow.com/a/63133876/10011503)
Do I need to pass some specific table setting as argument to pdfplumber.open('path_to_pdf').pages[0].extract_table()
or is there any other utility and/or workaround?