Is such a thing possible? Yes, although it is not recommended. In my opinion, your best bet is to open and read your existing file, move it to an editable format, remove whatever text that you don't want present and then convert it back.
However, you could extract the data and remove it from memory by using:
import PyPDF2
# creating a pdf file object
pdfFileObj = open('example.pdf', 'rb')
# creating a pdf reader object
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
# printing number of pages in pdf file
print(pdfReader.numPages)
# creating a page object
pageObj = pdfReader.getPage(0)
# extracting text from page
print(pageObj.extractText())
# closing the pdf file object
pdfFileObj.close()
Line by line, this program would:
pdfFileObj = open('example.pdf', 'rb')
Open the example.pdf
and save the file object as pdfFileObj
.
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
Create an object of PdfFileReader
and pass the PDF file object whole getting a PDF reader object.
print(pdfReader.numPages)
Give the number of pages.
pageObj = pdfReader.getPage(0)
Create an object of PageObject
class. PDF reader object has function getPage()
which takes page number (starting form index 0) as an argument and returns the page object.
print(pageObj.extractText())
Extract text from the PDF page.
pdfFileObj.close()
Close the PDF file object.
The replacement text would simply be "", as you want to remove all instances / cases of a certain piece of text.