Assign part of a string to list

Question

I'm trying to extract text from a PDF using PDFminer.six, is there a way to find all instances of a certain phrase appearing in that string. I know a way to find the phrases and remove them but I can't seem to save the text around the phrase to a variable or list. Is there an easy way to do this that I've overlooked?

from pdfminer.high_level import extract_text

text = extract_text('Pdf Scanner/test.pdf')

textf = text.find("vejkode")

print(len(text))

This is what I have so far.

Would [this](https://stackoverflow.com/a/4664889/12118546) help? — Roman Pavelka, Oct 25 '22 at 19:02

score 0 · Accepted Answer · answered Oct 25 '22 at 19:08

def extract_phrase(keyword='vejkode', file='test.pdf', window=30):
    text = extract_text(file)
    start = text.find(keyword)
    end = start + len(keyword)
    phrase = text[start - window:end + window]
    return phrase.split()[1:-1]  # trim truncated words at each end

Assign part of a string to list

1 Answers1