11

I'm just getting started with using Python's mock library to help write more concise and isolated unit tests. My situation is that I've got a class that reads in data from a pretty hairy format, and I want to test a method on this class which presents the data in a clean format.

class holds_data(object):
    def __init__(self, path):
        """Pulls complicated data from a file, given by 'path'.

        Stores it in a dictionary. 
        """
        self.data = {}
        with open(path) as f:
            self.data.update(_parse(f))

    def _parse(self, file):
        # Some hairy parsing code here
        pass

    def x_coords(self):
        """The x coordinates from one part of the data
        """
        return [point[0] for point in self.data['points']]

The code above is a simplification of what I have. In reality _parse is a fairly significant method which I have test coverage for at the functional level.

I'd like to be able, however, to test x_coords at a unit test level. If I were to instantiate this class by giving it a path, it would violate the rules of unit tests because:

A test is not a unit test if:

  • It touches the filesystem

So, I'd like to be able to patch the __init__ method for holds_data and then just fill in the part of self.data needed by x_coords. Something like:

from mock import patch
with patch('__main__.holds_data.__init__') as init_mock:
    init_mock.return_value = None
    instance = holds_data()
    instance.data = {'points':[(1,1),(2,2),(3,4)]}
    assert(instance.x_coords == [1,2,3])

The code above works but it feels like it's going about this test in a rather roundabout way. Is there a more idiomatic way to patch out a constructor or is this the correct way to go about doing it? Also, is there some code smell, either in my class or test that I'm missing?

Edit: To be clear, my problem is that during initialization, my class does significant amounts of data processing to organize the data that will be presented by a method like x_coords. I want to know what is the easiest way to patch all of those steps out, without having to provide a complete example of the input. I want to only test the behavior of x_coords in a situation where I control the data it uses.

My question of whether or not there is code smell here boils down to this issue:

I'm sure this would be easier if I refactor to have x_coords be a stand alone function that takes holds_data as a parameter. If "easier to tests == better design" holds, this would be the way to go. However, it would require the x_coords function to know more about the internals of holds_data that I would normally be comfortable with. Where should I make the trade off? Cleaner code or cleaner tests?

Community
  • 1
  • 1
Wilduck
  • 13,822
  • 10
  • 58
  • 90

2 Answers2

6

Since you're only interested in testing one method, why don't you just mock the entire HoldsData class and pin on it the x_coords method?

>>> mock = MagicMock(data={'points': [(0,1), (2,3), (4,5)]})
>>> mock.x_coords = HoldsData.__dict__['x_coords']
>>> mock.x_coords(mock)
[0, 2, 4]

This way you'll have full control over the x_coords input and output (either by side effect or return value).

Note: In py3k you could just do mock.x_coords = HoldsData.x_coords since there are no more unbound methods.

That could also be done in the constructor of the mock object:

MagicMock(data={'points': [(0,1), (2,3), (4,5)]}, x_coords=HoldsData.__dict__['x_coords'])
Community
  • 1
  • 1
Rik Poggi
  • 28,332
  • 6
  • 65
  • 82
  • 1
    3 years on but may be useful to someone: Once you've got the `mock` object, you can just do `holds_data.x_coords(mock)` and save messing around assigning methods to things. – Holloway Nov 04 '15 at 13:16
1

You're basically running into this issue due to this:

A test is not a unit test if:

  • It touches the filesystem

If you wish to follow this rule, you should modify the _parse method. In particular, it should not take a file as input. The task of _parse is to parse the data but where the data comes from is not the concern of that method.

You could have a string that contains the same data which is then passed to _parse. Similarly, the data could come from a database or somewhere else entirely. When _parse only takes the data as input, you can unit test that method in a much easier way.

It would look something like this:

class HoldsData(object):
    def __init__(self, path):
        self.data = {}
        file_data = self._read_data_from_file(path)
        self.data.update(self._parse(file_data))

    def _read_data_from_file(self, path):
        # read data from file
        return data

    def _parse(self, data):
        # do parsing

Clean code leads to clean tests of course. The best case would be to mock the data and to provide _parse with input and then test x_coords later. If that's not possible, I'd leave it as it is. If your six lines of mock code are the only part of the test cases that you're worried about, then you're fine.

Community
  • 1
  • 1
Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
  • This is actually how my real class works. The `__init__` method reads a file in to a list of strings and `_parse` works from that. The problem is that this method does *a lot* and I want to mock out all but the small portion that `x_coords` uses. I'm not sure if I'm doing that part correctly. – Wilduck Apr 02 '12 at 18:28
  • See my edit. My problem is that it is unfeasible to provide a complete fake of my input. I just want to test the behavior of the single method `x_coords`, not the behavior of `_parse` (I have functional tests for that). – Wilduck Apr 02 '12 at 18:36
  • In that case you could to refactor `_parse` and split it into separate functions. Is that possible? You could then provide a part of the input that can be parsed and then you can test `x_coords`. – Simeon Visser Apr 02 '12 at 18:57
  • I can't really break it up in to separate methods. I'm reading in data from a file format where, if you wrote a grammar for it, it would be context sensitive. I basically have to parse it in one shot (I've got a finite state machine that does this *okay*). I'm just looking for the most convenient way to make *all* of that go away for my test of `x_coords`, I want to isolate it as much as possible. If this is actually a very unusual situation (I'm new to mocking, it could be for all I know), I'll accept that I have to make do with my hack, although I'd like to know why this isn't easier =). – Wilduck Apr 02 '12 at 19:26
  • For example, I'm sure this would be easier if I refactor to have `x_coords` be a stand alone function that takes `holds_data` as a parameter. If easier to tests == better design, this would be the way to go. However, it would require the `x_coords` function to know more about the internals of `holds_data` that I would normally be comfortable with. Where should I make the trade off? Cleaner code or cleaner tests? – Wilduck Apr 02 '12 at 19:29
  • Clean code leads to clean tests of course. The best case would be to mock the data and to provide `_parse` with input and then test `x_coords` later. If that's not possible, I'd leave it as it is. If your six lines of mock code are the only part of the test cases that you're worried about, then you're fine :) – Simeon Visser Apr 02 '12 at 19:37
  • As stated in my question, I have tests for `_parse` so I feel like it doesn't need to be executed in any more tests right now. So yeah, it really was only my six lines of test code. If you edit your answer to include that last comment, I'll accept it. Also, thanks =). – Wilduck Apr 02 '12 at 19:41