Testing only affected code in Python

Question

I've been working on a fairly large Python project with a number of tests.

Some specific parts of the application require some CPU-intensive testing, and our approach of testing everything before commit stopped making sense.

We've adopted a tag-based selective testing approach since. The problem is that, as the codebase grows, maintaining said tagging scheme becomes somewhat cumbersome, and I'd like to start studying whether we could build something smarter.

In a previous job the test system was such that it only tested code that was affected by the changes in the commit.

It seems like Mighty Moose employs a similar approach for CLR languages. Using these as inspiration, my question is, what alternatives are there (if any) for smart selective testing in Python projects?

In case there aren't any, what would be good initial approaches for building something like that?

Think you can diff the bytecode and setup stubs that force execution along paths that have changed? — inspectorG4dget, Nov 19 '12 at 04:43
Are you aware of the possible issues resulting from _selective testing_ approach? Change in one place may break code elsewhere, so it may not be such a reliable approach. You can however test parts of the project just by invoking specific test case, or by using eg. `skipIf` decorators (like in `unittest` module). This is rather a question about test structure, not about some magical tool for executing test cases that would be affected by specific change. On the other hand, analyzing coverage of every test case could help identify which parts are executed by specific test case (and use that). — Tadeck, Nov 19 '12 at 04:55
Using a git pre-hook on the commit and then getting the files that have changed would allow you to them run your python test suite against only the files that have changed at the commit. This requires very strict file/class naming schema that if someone does not follow will break that particular tests for the changed code. — sean, Nov 19 '12 at 04:57
This is about the point that I would consider a test server, some ability to run tests in parallel, a nightly build/test schedule, or some combination of those. Having tests that fail overnight is okay if it happens on a branch where such things are expected. — John Lyon, Nov 19 '12 at 05:13
I'm aware of separate testing and testing servers and we'll be certainly adopting that in the near future to run full test suites. This also solves the problem with selective testing before commit. The problem is still relevant, though, I think. — Cesar Kawakami, Nov 19 '12 at 14:57

score 2 · Accepted Answer · answered Jun 24 '13 at 08:49

The idea of automating the selective testing of parts of your application definitely sounds interesting. However, it feels like this is something that would be much easier to achieve with a statically typed language, but given the dynamic nature of Python it would probably be a serious time investment to get something that can reliably detect all tests affected by a given commit.

When reading your problem, and putting aside the idea of selective testing, the approach that springs to mind is being able to group tests so that you can execute test suites in isolation, enabling a number of useful automated test execution strategies that can shorten the feedback loop such as:

Parallel execution of separate test suites on different machines
Running tests at different stages of the build pipeline
Running some tests on each commit and others on nightly builds.

Therefore, I think your approach of using tags to partition tests into different 'groups' is a smart one, though as you say the management of these becomes difficult with a large test suite. Given this, it may be worth focussing time in building tools to aid in the management of your test suite, particularly the management of your tags. Such a system could be built by gathering information from:

Test result output (pass/fail, execution time, logged output)
Code coverage output
Source code analysis

Good luck, its definitely an interesting problem you are trying to solve, and hope some of these ideas help you.

I'm accepting this answer (after a long time, I'm really sorry) because, after some time, it's these strategies that we ended up adopting for our testing approach. We split our projects into more independent parts as our "selective" approach. We still use a tag-based approach for resource-heavy tests, and employ a highly parallel test system to shorten test times. @joe-mcmahon's answer is also valuable, though, and dependency analysis is still something in my mind for the future. — Cesar Kawakami, Oct 31 '13 at 18:22

score 1 · Answer 2 · answered Oct 21 '19 at 20:24

I guess you are looking for a continuous testing tool?

I created a tool that sits in the background and runs only impacted tests: (You will need PyCharm plugin and pycrunch-engine from pip)

https://github.com/gleb-sevruk/pycrunch-engine

This will be particularly useful if you are using PyCharm.

More details are in this answer: https://stackoverflow.com/a/58136374/2377370

score 0 · Answer 3 · answered Nov 19 '12 at 06:59

If you are using unittest.TestCase then you can specify which files to execute with the pattern parameter. Then you can execute tests based on the code changed. Even if not using unittest, you should have your tests are organsied by functional area/module so that you can use a similar approach.

Optionally, not an elegant solution to your problem but if each developer/group or functional code area was committed to a separate branch, you could have it executed on your Continuous Testing environment. Once that's completed (and passed), you can merge them into your main trunk/master branch.

A combination of nightly jobs of all tests and per-branch tests every 15-30 minutes (if there are new commits) should suffice.

score 0 · Answer 4 · edited May 23 '17 at 11:45

A few random thoughts on this subject, based on work I did previously on a Perl codebase with similar "full build is too long" problems:

Knowing your dependencies is key to having this work. If module A is dependent on B and C, then you need to test A when either of then is changed. It looks like Snakefood is a good way to get a dictionary that outlines the dependencies in your code; if you take that and translate it into a makefile, then you can simply "make test" on check in and all of the dependencies (and only the needed ones) will be rebuilt and tested.
Once you have a makefile, work on making it parallel; if you can run a half-dozen tests in parallel, you'll greatly decrease running time.

score 0 · Answer 5 · answered Jun 18 '13 at 14:16

If you write the test results to file you can then use make or an similar alternative to determine when it needs to "rebuild" the tests. If you write results to the file, make can compare the date time stamp of the tests with the dependant python files.

Unfortunately Python isn't too good at determining what it depends on, because modules can be imported dynamically, so you can't reliably look at imports to determine affected modules.

I would use a naming convention to allow make to solve this generically. A naive example would be:

%.test_result : %_test.py
python $< > $@

Which defines a new implicit rule to convert between _test.py and test results. Then you can tell make your additional dependencies for you tests, something like this:

my_module_test.py : module1.py module2.py external\module1.py

score 0 · Answer 6 · answered Jul 27 '13 at 00:33

Consider turning the question around: What tests need to be excluded to make running the rest tolerable. The CPython test suite in Lib/test excludes resource heavy tests until specifically requested (as they may be on a buildbot). Some of the optional resources are 'cpu' (time), 'largefile' (disk space), and 'network' (connections). (python -m test -h (on 3.x, test.regrtest on 2.x) gives the whole list.)

Unfortunately, I cannot tell you how to do so as 'skip if resource is not available' is a feature of the older test.regrtest runner that the test suite uses. There is an issue on the tracker to add resources to unittest.

What might work in the meantime is something like this: add a machine-specific file, exclusions.py,containing a list of strings like those above. Then import exclusions and skip tests, cases, or modules if the appropriate string is in the list.

score 0 · Answer 7 · answered Oct 01 '13 at 02:11

We've run into this problem a number of times in the past and have been able to answer it by improving and re-factoring tests. You are not specifying your development practices nor how long it takes you to run your tests. I would say that if you are doing TDD, you tests need to run no more than a few seconds. Anything that runs longer than that you need to move to a server. If your tests take longer than a day too run, then you have a real issue and it'll limit your ability to deliver functionality quickly and effectively.

score -1 · Answer 8 · answered Sep 12 '13 at 09:34

-1

Couldn't you use something like Fabric? http://docs.fabfile.org/en/1.7/

answered Sep 12 '13 at 09:34

user1034211

49
5

Testing only affected code in Python

8 Answers8