2

I'm trying to test some python code that uses urllib2 and lxml.

I've seen several blog posts and stack overflow posts where people want to test exceptions being thrown, with urllib2. I haven't seen examples testing successful calls.

Am I going down the correct path?

Does anyone have a suggestion for getting this to work?

Here is what I have so far:

import mox
import urllib
import urllib2
import socket
from lxml import etree

# set up the test
m = mox.Mox()
response = m.CreateMock(urllib.addinfourl)
response.fp = m.CreateMock(socket._fileobject)
response.name = None # Needed because the file name is checked.
response.fp.read().AndReturn("""<?xml version="1.0" encoding="utf-8"?>
<foo>bar</foo>""")
response.geturl().AndReturn("http://rss.slashdot.org/Slashdot/slashdot")
response.read = response.fp.read # Needed since __init__ is not called on addinfourl.
m.StubOutWithMock(urllib2, 'urlopen')
urllib2.urlopen(mox.IgnoreArg(), timeout=10).AndReturn(response)
m.ReplayAll()

# code under test
response2 = urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10)
# Note: response2.fp.read() and response2.read() do not behave the same, as defined above.
# In [21]: response2.fp.read()
# Out[21]: '<?xml version="1.0" encoding="utf-8"?>\n<foo>bar</foo>'
# In [22]: response2.read()
# Out[22]: <mox.MockMethod object at 0x97f326c>
xcontent = etree.parse(response2)

# verify test
m.VerifyAll()

It fails with:

Traceback (most recent call last):
  File "/home/jon/mox_question.py", line 22, in <module>
    xcontent = etree.parse(response2)
  File "lxml.etree.pyx", line 2583, in lxml.etree.parse (src/lxml/lxml.etree.c:25057)
  File "parser.pxi", line 1487, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:63708)
  File "parser.pxi", line 1517, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:63999)
  File "parser.pxi", line 1400, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:62985)
  File "parser.pxi", line 990, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:60508)
  File "parser.pxi", line 542, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:56659)
  File "parser.pxi", line 624, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:57472)
  File "lxml.etree.pyx", line 235, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:6222)
  File "parser.pxi", line 371, in lxml.etree.copyToBuffer (src/lxml/lxml.etree.c:55252)
TypeError: reading from file-like objects must return byte strings or unicode strings

This is because response.read() does not return what I expected it to return.

jmkacz
  • 63
  • 2
  • 7

3 Answers3

4

I wouldn't delve into urllib2 internals at all. It's beyond the scope of what you care about I think. Here's a simple way to do it with StringIO. The key thing here is that what you intent to parse as XML just needs to be file-like in terms of duck typing, it doesn't need to be an actual addinfourl instance.

import StringIO
import mox
import urllib2
from lxml import etree

# set up the test
m = mox.Mox()
response = StringIO.StringIO("""<?xml version="1.0" encoding="utf-8"?>
<foo>bar</foo>""")
m.StubOutWithMock(urllib2, 'urlopen')
urllib2.urlopen(mox.IgnoreArg(), timeout=10).AndReturn(response)
m.ReplayAll()

# code under test
response2 = urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10)
xcontent = etree.parse(response2)

# verify test
m.VerifyAll()
Peter Lyons
  • 142,938
  • 30
  • 279
  • 274
  • Thanks Peter. One more twist. What if I also wanted to check the response code? So, if (response2.getcode() == 200): parse; else: raise an exception. – jmkacz Jun 30 '10 at 14:07
  • I added `response.getcode = lambda: 200` after defining response, and it seems to be working. – jmkacz Jun 30 '10 at 14:57
  • OK, great. None of this is going to win any awards for elegance, but it gets the job done. – Peter Lyons Jun 30 '10 at 19:01
2

Echoing what Peter said, I would just add that you may not need to be concerned with lxml internals any more than those of urllib2. By mocking lxml.etree, you can totally isolate the code that you really need to test, your own. Here's an example that does that, and also shows how you can use a mock object to test the response.getcode() call.

import mox
from lxml import etree
import urllib2

class TestRssDownload(mox.MoxTestBase):

    def test_rss_download(self):
        expected_response = self.mox.CreateMockAnything()
        self.mox.StubOutWithMock(urllib2, 'urlopen')
        self.mox.StubOutWithMock(etree, 'parse')
        self.mox.StubOutWithMock(etree, 'iterwalk')
        title_elem = self.mox.CreateMock(etree._Element)
        title_elem.text = 'some title'

        # Set expectations 
        urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10).AndReturn(expected_response)
        expected_response.getcode().AndReturn(200)
        etree.parse(expected_response).AndReturn('some parsed content')
        etree.iterwalk('some parsed content', tag='{http://purl.org/rss/1.0/}title').AndReturn([('end', title_elem),])

        # Code under test
        self.mox.ReplayAll()
        self.production_code()

    def production_code(self):
        response = urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10)
        response_code = response.getcode()
        if 200 != response_code:
            raise Exception('Houston, we have a problem ({0})'.format(response_code))
        tree = etree.parse(response)
        for ev, elem in etree.iterwalk(tree, tag='{http://purl.org/rss/1.0/}title'):
            # Do something with elem.text
            print('{0}: {1}'.format(ev, elem.text))
Jacob Wan
  • 2,521
  • 25
  • 19
0

It looks like your failure isn't related to mox at all - the line causing the error is reading from response2, which is a direct call to slashdot. Perhaps inspect that object and see what it's content is?

EDIT: I didn't see the m.StubOutWithMock(urllib2, 'urlopen') line above, so I thought you were comparing two calls; one mocked (response) and one not (response2). An updated answer is below.

Anthony Briggs
  • 3,395
  • 1
  • 22
  • 12
  • If you look at urllib.py, self.read is set equal to self.fp.read. These two calls should return the same data. From my comments in the code, self.fp.read is returning a string, while self.read is returning . This is because `__init__` is not called on addinfourl, so I added the method assignment into my test code, and it doesn't return what I expect. – jmkacz Jun 30 '10 at 00:44
  • What happens if you define it explicitly? ie. instead of: `response.read = response.fp.read` use: `response.read().AndReturn("""bar""")` It's possible that mox is doing some sort of magic behind the scenes when you're calling the .read() method. – Anthony Briggs Jul 01 '10 at 03:32
  • I had tried that, but the low-level library calls read with the number of bytes it wants, so you can't really mock out read in this case. – jmkacz Jul 01 '10 at 20:26