4

I've familiarized myself with the concept, most notably by watching Raymond Hettinger's excellent video and reading the accepted answer here and I am wondering what I got wrong.

class ReadHTML(object):

    def __init__(self, url):
        page = urlopen(url).read()
        self.page = page

    @classmethod
    def from_file(cls, path):
        page = open(path).read()
        return cls(page)

This works

r = ReadHTML('http://example.com')
print r.page

and this is not

r = ReadHTML.from_file('example.html')
print r.page 

it throws me an error, as if I was trying to "urlopen" a file:

File "/usr/lib/python2.7/urllib2.py", line 258, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: <!doctype html>

Can you see what's wrong?

Community
  • 1
  • 1
nutship
  • 4,624
  • 13
  • 47
  • 64

3 Answers3

5

You are still calling the class initializer, ReadHTML.__init__(), when you call cls(page); that call is no different from calling ReadHTML(page), you are just using a different reference. This method only accepts a url parameter and the code passes that to urlopen() regardless.

Adjust your ReadHTML.__init__() method to handle being passed a page instead of a URL:

class ReadHTML(object):
    def __init__(self, url=None, page=None):
        if url is not None:
            page = urlopen(url).read()
        self.page = page

    @classmethod
    def from_file(cls, path):
        page = open(path).read()
        return cls(page=page)

Now the code supports both paths to produce an instance.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Great, thank you! I use Sublime Text2 (as you do I guess) and when I run `r = ReadHTML.from_file('example.html') print r.page` Sublime's Build console shows me `[Finished in 1.0s] [Finished in 0.7s]` at the bottom of the output. Still the output is correct but do you know where this double finish time come from? I never go it before. Can you check it out? – nutship Feb 26 '14 at 16:19
  • 1
    @nutship: I have no idea; I do use ST, but generally not the build system. – Martijn Pieters Feb 26 '14 at 16:33
  • I know this is quite commonly done, but I believe parameters should not override each other like that. – tawmas Feb 26 '14 at 16:57
  • 1
    @tawmas: The goal was to explain to the OP why their code failed. I'd also work out a better API here, and not necessarily override behaviour like this. It also depends on the typical use-case vs. the convenience class method, etc. – Martijn Pieters Feb 26 '14 at 17:13
2

from_file opens the page, but so does your __init__() constructor, so if you do ReadHTML.from_file('example.html'), you are essentially doing:

page = urlopen(open('example.html').read()).read()

Personally, I prefer Martijn's solution, for semantic clarity, but here is an alternative:

class ReadHTML(object):
    def __init__(self, url, opener=urlopen):
        self.page = opener(url).read()

    @classmethod
    def from_file(cls, path):
        return cls(path, opener=open)

This solution is advantageous because it gives you the capability of defining arbitrary openers (say, for opening files stored in a database).

Community
  • 1
  • 1
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
  • I like the extra flexibility of being able to override the opener, especially because this is needed to do anything fancy with urllib/urllib2. – tawmas Feb 26 '14 at 16:58
1

I'm not a big fan of optional parameters overriding each other. I would make it so that the default constructor accepts a string and I would have to separate alternate constructors to handle a filename and an URL.

I also modified the filename constructor to explicitly close the file.

class ReadHTML(object):

    def __init__(self, page):
        self.page = page

    @classmethod
    def from_filename(cls, path):
        with open(path) as f:
            page = f.read()
        return cls(page)

    @classmethod
    def from_url(cls, url):
        page = urlopen(url).read()
        return cls(page)

As a side note, I believe urllib/urllib2 support file://, so you would strictly not need the filename constructor (but I still believe it is nice to have).

tawmas
  • 7,443
  • 3
  • 25
  • 24