I have lots of pdf in same format (contents are different). the pdf contains text, tables,etc. The pdf also has bold text which I want to extract and convert it into column name and the details under the bold text, I want to extract and convert it into rows. the pdf contains tables also. I want to do this all-in python. any idea?
this is what I tried so far. no idea after that.
import PyPDF2
df=PyPDF2.PdfFileReader("246427 postop note.pdf")
print(df.getNumPages())
str1=""""""
for i in range(0,4):
str1+=df.getPage(i).extractText()
print(str1)