0

My team has puzzled at this issue on and off for weeks now. We have a test suite using LiveServerTestCase that runs all the Selenium-based tests that we have. One test in particular will seemingly randomly fail for no reason sometimes--I could change a comment in a different file and the test would fail. Changing some other comment would fix the test again. We are using the Firefox webdriver for the Selenium tests:

self.driver = Firefox()

Testing locally inside our Docker container can never reproduce the error. This is most likely due to the fact that when tests.py is run outside of Travis CI, a different web driver is used than Firefox(). The web driver instead is as such:

self.driver = WebDriver("http://selenium:4444/wd/hub", desired_capabilities={'browserName':'firefox'})

For local testing, we use a Selenium container.

The test that fails is a series of sub-tests that each tests a filtering search feature that we have; each sub-test is a different filter query. The sequence of each sub-test is:

  1. Find the filter search bar element
  2. Send the filter query (a string, i.e. something like "function = int main()")
  3. Simulate the browser click to execute the query
  4. For the specific filter on the set of data (the set of data is consistent throughout the subtests), assert that the length of the returned results matches what is expected for that specific filter

Very often this test will pass when run in Travis CI, and as noted before, this test always passes when run locally. The error cannot be reproduced when interacting with the site manually in a web browser. However, once in a while, this sort of error will appear in the test output in Travis CI:

- Broken pipe from ('127.0.0.1', 39000)
- Broken pipe from ('127.0.0.1', 39313)

39000 and 39313 are not always the numbers--these change every time a new Travis CI build is run. These seem like port numbers, though I'm not really sure what they actually are.

We have time.sleep(sec) lines right before fetching the list of results for a filter. Increasing the sleep time usually will correlate with a temporary fix of the broken pipe error. However, the test is very fickle and changing the sleep time likely does not have much to do with fixing the error at all; there have been times where the sleep time has been reduced or taken out of a subtest and the test will pass. In any case, as a result of the broken pipe, the filter cannot get executed and the assertion fails.

One potentially interesting detail is that regardless of the order of subtests, it is always the first subtest that fails if the broken pipe error occurs. If, however, the first subtest passes, then all subtests will always pass.

So, my question is: what on earth is going on here and how do we make sure that this random error stops happening? Apologies if this is a vague/confusing question, but unfortunately that is the nature of the problem.

Dan K
  • 803
  • 8
  • 26
  • Can you get performance logs out of your application? Is it possible your application is failing? – Arran Aug 21 '15 at 15:55
  • Can't imagine it failing since all other tests pass. Like I said, tests pass 100% of the time outside of Travis and the error cannot be reproduced through normal usage of the site. Within Travis the tests still pass more often than not. What do you mean by performance logs? The app has timing logs relating to database queries but I can't imagine anything like that helping diagnose the issue. – Dan K Aug 21 '15 at 17:07

2 Answers2

1

It looks like your issue may be similar to what this fellow was running into. It's perhaps an issue with your timeouts. You may want to use an explicit wait, or try waiting for a specific element to load before comparing the data. I had similar issues with my test where my Selenium test would try polling an image to see if it was present before the page had finished loading. Like I say, this may not be the same issue, but could potentially help. Goodluck!

Community
  • 1
  • 1
Tyler
  • 197
  • 3
  • 13
  • Initially I was skeptical of this answer since we had already done a lot of fiddling with explicit waits and such with no real effect. However, it seems that one subtest having a long enough explicit wait (`time.sleep`) finally did the trick. Originally it had no such wait at all. However, after adding a long enough wait and going through 11 Travis builds, there was no failure, so it seems to be fixed! – Dan K Aug 21 '15 at 20:17
0

I just ran into this myself, and this is caused by the django's built-in server not using python's logging system. This has been fixed in 1.10 but is not released yet at the time of writing. In my case it is acceptable to leave the messages in the log until it is time to upgrade; better than adding timeouts and increasing build time.

Anonymous
  • 11,740
  • 3
  • 40
  • 50