how to extract bold text , unbold text and tables from pdf in python?

Asked Oct 15 '22 at 06:28

Active Oct 15 '22 at 06:28

Viewed 456 times

I have lots of pdf in same format (contents are different). the pdf contains text, tables,etc. The pdf also has bold text which I want to extract and convert it into column name and the details under the bold text, I want to extract and convert it into rows. the pdf contains tables also. I want to do this all-in python. any idea?

this is what I tried so far. no idea after that.

import PyPDF2
df=PyPDF2.PdfFileReader("246427 postop note.pdf")
print(df.getNumPages())

str1=""""""
for i in range(0,4):
    str1+=df.getPage(i).extractText()
print(str1)

asked Oct 15 '22 at 06:28

use `camelot.py` for tables – Mehmaam Oct 15 '22 at 06:32
& for bold text, use fitz/pymupdf2 ... https://stackoverflow.com/questions/68382847/extracting-text-using-flags-to-focus-on-bold-italic-font-using-pymupdf – Sachin Kohli Oct 15 '22 at 07:32

how to extract bold text , unbold text and tables from pdf in python?

0 Answers0