1

I have a program written in my python using the PyPDF2 package to scrape a batch of pdf files. These PDF's aren't in the greatest shape so in order for my program to run, I need to modify the pdf.py file located within the package library as recommended by this website:

https://cheonhyangzhang.wordpress.com/2015/03/31/python-pdffilereader-pdfreaderror-eof-marker-not-found/

Is there a way I can implement this change to the file while keeping the original file intact? I've tried creating a child class of PdfFileReader class and modifying the 'read' method as prescribed by my link above, however, I've found that that leads to several import dependency issues that I'd like to avoid.

Is there an easier way to do this?

Willipedia
  • 51
  • 1
  • 2

2 Answers2

0

I would recommend to copy the pdf.py file into our script directory and rename it to mypdf.py. You can then modify the copy as you please without affecting the original. You can import the package using

import mypdf

I have done something similar for shutil.py as the default buffer size is too small in Windows for transferring large files.

Alex
  • 21,273
  • 10
  • 61
  • 73
0

You can add (or redefine) a method of class using setattr() like this (where the class has been defined inline rather than being imported only for purposes of illustration):

class Class(object):
    pass

def func(self, some_other_argument):
    return some_other_argument

setattr(Class, 'func', func)

if __name__ == '__main__':
    c = Class()
    print(c.func(42))  # -> 42
martineau
  • 119,623
  • 25
  • 170
  • 301