Writing a unit test for a function that verifies if a file is pdf or not?

Question

I am writing to write unit test cases for my project and relatively new to this. I have a function which checks if a given file is a PDF or not (function below):

def file_verify(orig_pdf):
     try:
         read_pdf = PyPDF2.PdfFileRead(open(orig_pdf,'rb'))
     except PyPDF2.utils.PdfReadError:
         return orig_pdf, "error: Invalid PDF is not supported!"
     else:
         return orig_pdf, os.path.basename(orig_pdf) + "is of PDF file format"

Now how would I write a unit test for this function in python to ensure it is working correctly?

Edit: I was able to write the unit test function so far (based on the information I received online) like this:

testdata_filename = 'my pdf location'

class TestVerifyPDF(unittest.TestCase):

   def setUp(self):
       self.testfile = open(testdata_filename)
       self.testfile = self.testfile.read()

   def tearDown(self):
       self.testfile.close()

   def test_pdf(self):
       <test here>

I'd start by splitting the file handling out from the verification function, so that you can pass content of your test's choosing, and so have the tests nicely separated from any filesystem contents. — Toby Speight, Jan 22 '20 at 15:00
If I understood you right, I think you are saying to split the output and separate orig_pdf from the message prompt? Okay, but how would the unit test case look like? I do not think I should be using PyPDF2 again in my unit test case script too. Not sure how the unit case would be setup. I have edited my post to reflect the unit test I have written so far — Vaishak N Chandran, Jan 22 '20 at 15:06
Why would you return the file name the caller already has? The other strings are arbitrary stand-ins for `True` and `False`, which is what this function *should* return. — chepner, Jan 23 '20 at 23:00

score 0 · Answer 1 · answered Jan 23 '20 at 22:47

My recommendation would be to test such a piece of code with integration testing, because there is not much value in testing it with unit-testing. Your function file_verify is dominated by interactions with depended-on-components. Think about what kind of bugs you might want to find:

open called with wrong arguments (maybe orig_pdf not containing valid characters or not being a string, or maybe 'rb' not being well formed or file modes wrongly chosen)
open called with arguments in wrong order, arguments missing, maybe the wrong function in general - should it rather be od.fdopen?
PyPDF2.PdfFileRead called with wrong argument, for example might expect file name rather than file
PyPDF2.PdfFileRead throws different exception than expected
...

These bugs can not be found with unit-testing, since with unit-testing you aim at finding bugs in the isolated code. This means, for example, that you use a test double (a stub or a mock or something the like) for the open function instead of the real open function. Your substituted open function will be used to check if your code under test calls the open function according to your (possibly wrong) assumptions about how it should be called.

To make the example more concrete, if your assumption is the file modes should be 'rb', your tests with the substituted open will check that you pass 'rb' as argument to open. However, if your assumption was wrong in the first place, your unit-tests will not help you to detect your wrong assumption. Instead, testing against the real open function will show you, for example, if your modes argument was malformed, and testing against the real PyPDF2.PdfFileRead may show that the function in fact requires a writable file.

Writing a unit test for a function that verifies if a file is pdf or not?

1 Answers1