2

I'm using BeautifulSoup to do some crawling, and want to chain find calls, for example:

soup.find('div', class_="class1").find('div', class_="class2").find('div', class_="class3")

Of course, this breaks whenever one of the divs cannot be found, throwing an

AttributeError: 'NoneType' object has no attribute 'find'

Is there a way to modify NoneType to add a find method such as

class NoneType:
    def find(*args):
        return None

so that I can do something like

thing = soup.find('div', class_="class1").find('div', class_="class2").find('div', class_="class3")
if thing:
    do more stuff

instead of

thing1 = soup.find('div', class_="class1")
if thing1:
    thing2 = thing1.find('div', class_="class2")
    if thing2:
        thing3 = thing2.find('div', class_="class3")
        etc.

I think I might be able to do something similar by using a parser with XPath capabilities, but the question is not specific to this use case and is more about modifying/overriding built in classes.

kenm
  • 23,127
  • 2
  • 43
  • 62
colblitz
  • 105
  • 1
  • 8
  • related: [Can you monkey patch methods on core types in python?](http://stackoverflow.com/q/192649/4279) Spoiler: you shouldn't do it. But it can be done using ctypes hacks, see [fobiddenfruit](https://clarete.github.io/forbiddenfruit). – jfs Mar 10 '14 at 23:49
  • unrelated: `find('div', class_="class1")` can be written as `find("div", "class1")`. The whole expression could be written as css select: `soup.select("div.class1 div.class2 div.class3")` – jfs Mar 10 '14 at 23:57
  • Usually the [Fluent interface](http://en.wikipedia.org/wiki/Fluent_interface) is implemented using a proxy object such as QuerySet in Django, Query in SQLAlchemy: you can chain methods freely and call special methods such as `.all()`, `.first()` at the end to retrieve the final result. – jfs Mar 11 '14 at 00:04

6 Answers6

2

Why not use a try/except statement instead (since you cannot modify NoneType)?

try:
    thing = soup.find('div', class_="class1").find('div', class_="class2").find('div', class_="class3")
    do more stuff
except AttributeError:
    thing = None  # if you need to do more with thing
Justin O Barber
  • 11,291
  • 2
  • 40
  • 45
  • I don't think this answer OP question as he specifically wrote 'the question is not specific to this use case and is more about modifying/overriding built in classes' – hivert Mar 10 '14 at 23:00
  • Yeah, I wanted to know about modifying built in classes for future reference, but I like this approach. Thanks :D – colblitz Mar 10 '14 at 23:04
  • 1
    +1 EAFP is suitable in this case to implement [the Maybe monad](http://en.wikipedia.org/wiki/Monad_(functional_programming)#The_Maybe_monad) in Python. – jfs Mar 10 '14 at 23:54
1

You can't modify builtin class such as NoneType or str:

>>> nt = type(None)
>>> nt.bla = 23
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't set attributes of built-in/extension type 'NoneType'

For some of them (eg str), you can inherit from:

>>> class bla(str):
...      def toto(self): return 1
>>> bla('2123').toto()
1

It's not possible with NoneType. And it won't help you either:

>>> class myNoneType(nt):
...      def find(self): return 1
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    type 'NoneType' is not an acceptable base type
hivert
  • 10,579
  • 3
  • 31
  • 56
  • 1
    Adding a subclass won't affect the type of the `None` literal though, nor make any of the millions of places that `return` or `return None` use that subclass. –  Mar 10 '14 at 22:55
  • I'm answering to the OP 'but the question is not specific to this use case and is more about modifying/overriding built in classes' – hivert Mar 10 '14 at 22:59
  • Thanks. Is it possible in Ruby? I may have been misremembering what I can do in one or the other. – colblitz Mar 10 '14 at 23:04
  • And that is a general limitation of the approach: You can now construct objects that behave differently but mostly pass as a builtin, but you can't force other people to use that type. –  Mar 10 '14 at 23:04
  • Yes, it is possible in Ruby: `class NilClass; def find; 1; end; end`. – Alp Mar 10 '14 at 23:07
  • [you can modify builtin class](http://stackoverflow.com/questions/22313065/adding-method-to-pythons-nonetype#comment33907155_22313065) – jfs Mar 11 '14 at 00:06
1

An approach might be to have a

class FindCaller(object):
    def __init__(self, *a, **k):
        self.a = a
        self.k = k
    def __call__(self, obj):
        return obj.find(*self.a, **self.k)

def callchain(root, *fcs):
    for fc in fcs:
        root = fc(root)
        if root is None: return
    return root

and then do

thing = callchain(soup,
    FindCaller('div', class_="class1"),
    FindCaller('div', class_="class2"),
    FindCaller('div', class_="class3"),
)
glglgl
  • 89,107
  • 13
  • 149
  • 217
1

You cannot modify the class and the real question is why you would try? NoneType means there is no data there so when you attempt a .find() on that type even if it did exist you would only get null or no values from it. I would reccomend something like this.

try:
    var = soup.find('div', class_="class1").find('div', class_="class2").find('div', class_="class3")
except AttributeError:
    do something else instead or message saying there was no div
inuasha
  • 23
  • 8
0

You can't inherit from None:

>>> class Noneish(type(None)):
...   pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type 'NoneType' is not an acceptable base type
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
0

You can't. For good reasons... In fact, NoneType is even less accessible than other built-in types:

type(None).foo = lambda x: x
# ---------------------------------------------------------------------------
# TypeError                                 Traceback (most recent call last)
# <ipython-input-12-61bbde54e51b> in <module>()
# ----> 1 type(None).foo = lambda x: x

# TypeError: can't set attributes of built-in/extension type 'NoneType'

NoneType.foo = lambda x: x
# ---------------------------------------------------------------------------
# NameError                                 Traceback (most recent call last)
# <ipython-input-13-22af1ed98023> in <module>()
# ----> 1 NoneType.foo = lambda x: x

# NameError: name 'NoneType' is not defined

int.foo = lambda x: x
# ---------------------------------------------------------------------------
# TypeError                                 Traceback (most recent call last)
# <ipython-input-14-c46c4e33b8cc> in <module>()
# ----> 1 int.foo = lambda x: x

# TypeError: can't set attributes of built-in/extension type 'int'

As suggested above, use try: ... except AttributeError: clause.

m.wasowski
  • 6,329
  • 1
  • 23
  • 30