I'm trying to process PDFs using PyMuPDF and I'm running this python file called process_pdf.py in the terminal.
> import sys, fitz
> fname = sys.argv[1] # get document filename
> doc = fitz.open(fname) # open document
> out = open(fname + ".txt", "wb") # open text output
> for page in doc: # iterate the document pages
> text = page.get_text().encode("utf8") # get plain text (is in UTF-8)
> out.write(text) # write text of page
> out.close()
Then I would feed in a pdf in the terminal such as python process_pdf.py 1.pdf
. This would then produce 1.txt (text version of 1.pdf). A question I have is that can I make a simple program in the terminal that can iterate python process_pdf.py document_name.pdf
multiple times like how a for-loop works? This is because the file names are sequential numbers.
I thought about making a for-loop such as
> for i in range(1,101):
> python process_pdf.py i.pdf
But that isn't how python works. P.S. Sorry if this doesn't make any sense; I'm very new into coding :(