Is there a regex or pdfgrep function in Spyder (Python) to extract lines from a pattern that matches in a pdf?

Question

I want to use a function to get let's say 10 lines before the pattern and 10 lines after the pattern in a pdf.

I have checked this module: https://pdfgrep.org/doc.html, but I do not know how to implement this line in spyder: `

pdfgrep -n --max-count 10 pattern foo.pdf

I am trying to apply the line above from pdfgrep module but I have no idea to do it. Also, I tried to find a regex funtion to ten lines after a match but I found nothing.

This is the code I am running to see the match: `

import PyPDF2
import re

pattern = input("Enter string pattern to search: ")
fileName = input("Enter file path and name: ")

object = PyPDF2.PdfFileReader(fileName)
numPages = object.getNumPages()

for i in range(0, numPages):
    pageObj = object.getPage(i)
    text = pageObj.extractText()

    for match in re.finditer(pattern, text):
        print(f'Page no: {i} | Match: {match}')

`

So I am looking for the method to write properly this line in spyder pdfgrep -n --max-count 10 pattern foo.pdf and include this function in my code. That is my goal.

You can use `subprocess` to run an arbitrary shell command from Python. Which IDE you are using is neither here nor there. — tripleee, Nov 01 '22 at 10:09

Is there a regex or pdfgrep function in Spyder (Python) to extract lines from a pattern that matches in a pdf?

0 Answers0