Team,
I have a pdf file about 6000+ pages. what's the fastest method I can use to extract the texts?
I am using this code
all_text = ""
with pdfplumber.open(pdf_dir) as pdf:
for page in pdf.pages:
text = page.extract_text()
all_text += text
but it's taking a lot of time to complete
also after extracting I would then need to search for the address which I am using this code:
address_line = re.compile(r'(: \d{5})')
for line in text.split('\n'):
if address_line.search(line):
print(line)
appreciate your help in advance :)