39

My problem is a general one, how to chain a series of attribute lookups when one of the intermediate ones might return None, but since I ran into this problem trying to use Beautiful Soup, I'm going to ask it in that context.

Beautiful Soup parses an HTML document and returns an object that can be used to access the structured content of that document. For example, if the parsed document is in the variable soup, I can get its title with:

title = soup.head.title.string

My problem is that if the document doesn't have a title, then soup.head.title returns None and the subsequent string lookup throws an exception. I could break up the chain as:

x = soup.head
x = x.title if x else None
title = x.string if x else None

but this, to my eye, is verbose and hard to read.

I could write:

title = soup.head and soup.head.title and soup.title.head.string

but that is verbose and inefficient.

One solution if thought of, which I think is possible, would be to create an object (call it nil) that would return None for any attribute lookup. This would allow me to write:

title = ((soup.head or nil).title or nil).string

but this is pretty ugly. Is there a better way?

Justin
  • 24,288
  • 12
  • 92
  • 142
David Hull
  • 1,255
  • 1
  • 14
  • 17
  • 2
    Maybe keep your code and catch+handle the `AttributeError` exception in cases where `None` is returned. – crayzeewulf Mar 07 '13 at 19:57
  • What do you want to have it return? – mgilson Mar 07 '13 at 19:58
  • 1
    [`Maybe`monad in python](http://stackoverflow.com/questions/8507200/maybe-kind-of-monad-in-python). See also [Monads in Python (with nice syntax!)](http://www.valuedlessons.com/2008/01/monads-in-python-with-nice-syntax.html) – jfs Mar 07 '13 at 20:01
  • Having it return `None` if any of the intermediate tags or attributes (that is, method calls) return `None` is fine. – David Hull Mar 07 '13 at 20:01
  • I doubt there's going to be a general, lightweight, non-ugly, non-verbose way to do this. The general (but not lightweight) way is to write your own class that wraps BeautifulSoup so it does what you want. The lightweight (but ugly or verbose) way is to do something like what you're already doing). – BrenBarn Mar 07 '13 at 20:03
  • 1
    I agree. And I think @crayzeewulf's is the most pythonic way. – shx2 Mar 07 '13 at 20:08
  • Note that these aren't method calls. they're attribute lookups (which translate to method calls, but it's syntatically different) – mgilson Mar 07 '13 at 20:10

7 Answers7

25

The most straightforward way is to wrap in a try...except block.

try:
    title = soup.head.title.string
except AttributeError:
    print "Title doesn't exist!"

There's really no reason to test at each level when removing each test would raise the same exception in the failure case. I would consider this idiomatic in Python.

jeffknupp
  • 5,966
  • 3
  • 28
  • 29
18

You might be able to use reduce for this:

>>> class Foo(object): pass
... 
>>> a = Foo()
>>> a.foo = Foo()
>>> a.foo.bar = Foo()
>>> a.foo.bar.baz = Foo()
>>> a.foo.bar.baz.qux = Foo()
>>> 
>>> reduce(lambda x,y:getattr(x,y,''),['foo','bar','baz','qux'],a)
<__main__.Foo object at 0xec2f0>
>>> reduce(lambda x,y:getattr(x,y,''),['foo','bar','baz','qux','quince'],a)
''

In python3.x, I think that reduce is moved to functools though :(


I suppose you could also do this with a simpler function:

def attr_getter(item,attributes)
    for a in attributes:
        try:
            item = getattr(item,a)
        except AttributeError:
            return None #or whatever on error
    return item

Finally, I suppose the nicest way to do this is something like:

try:
   title = foo.bar.baz.qux
except AttributeError:
   title = None
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • 2
    `reduce` is available as `functools.reduce` from 2.6 onwards - so an import probably wouldn't hurt that much anyway... – Jon Clements Mar 07 '13 at 20:08
  • 3
    I find this solution far uglier than the "verbose" solutions proposed in the question. – Jon-Eric Mar 07 '13 at 20:09
  • @Jon-Eric -- It's not pretty, although I've posted a slightly prettier version if it helps at all. Ultimately, it's not a pretty problem to have to solve and both of these will scale up to large numbers of attributes better than the "ugly" solutions above. Of course, if the attributes are nested that deeply anyway you start to question if your data structure is right, but ... That's for OP to worry about :) – mgilson Mar 07 '13 at 20:14
  • @crazeewulf -- It can easily be packed into a function which then hides the ugliness and you never need to look at it again. Part of programming is realizing when a task is going to be inherently ugly, and burrying the ugliness under a pretty interface. – mgilson Mar 07 '13 at 20:16
  • @mgilson Sorry I removed my comment because your reply to Jon-Eric's question appeared simultaneously and it addressed my question as well. In any case, for other readers, my questions was: Could you please elaborate on why the above solution (i.e. the use of `reduce`) is better as compared to the options that were already rejected by the OP? – crayzeewulf Mar 07 '13 at 20:20
  • I'm accepting this answer both because of its completeness (several suggested solutions) and because it suggested using `try ... except` before jknupp's answer. – David Hull Mar 08 '13 at 22:12
  • Pylance raises an error: "xxxxxxx" is not a known member of "None" Pylance (reportOptionalMemberAccess) even if I use `try ... except` – Michael Boñon Oct 21 '22 at 15:53
1

One solution would be to wrap the outer object inside a Proxy that handles None values for you. See below for a beginning implementation.

import unittest

class SafeProxy(object):

    def __init__(self, instance):
        self.__dict__["instance"] = instance

    def __eq__(self, other):
        return self.instance==other

    def __call__(self, *args, **kwargs):
        return self.instance(*args, **kwargs)

    # TODO: Implement other special members

    def __getattr__(self, name):
        if hasattr(self.__dict__["instance"], name):
            return SafeProxy(getattr(self.instance, name))

        if name=="val":
            return lambda: self.instance

        return SafeProxy(None)

    def __setattr__(self, name, value):
        setattr(self.instance, name, value)


# Simple stub for creating objects for testing
class Dynamic(object):
    def __init__(self, **kwargs):
        for name, value in kwargs.iteritems():
            self.__setattr__(name, value)

    def __setattr__(self, name, value):
        self.__dict__[name] = value


class Test(unittest.TestCase):

    def test_nestedObject(self):
        inner = Dynamic(value="value")
        middle = Dynamic(child=inner)
        outer = Dynamic(child=middle)
        wrapper = SafeProxy(outer)
        self.assertEqual("value", wrapper.child.child.value)
        self.assertEqual(None, wrapper.child.child.child.value)

    def test_NoneObject(self):
        self.assertEqual(None, SafeProxy(None))

    def test_stringOperations(self):
        s = SafeProxy("string")
        self.assertEqual("String", s.title())
        self.assertEqual(type(""), type(s.val()))
        self.assertEqual()

if __name__=="__main__":
    unittest.main()

NOTE: I am personally not sure wether I would use this in an actual project, but it makes an interesting experiment and I put it here to get people thoughts on this.

TAS
  • 2,039
  • 12
  • 17
  • This is a clever solution, and probably what I had in mind when I asked the question. This solution ends up being pretty heavyweight, and also has the disadvantage that all the attribute accesses are done even when one of the intermediate ones returns None and could potentially short-circuit the expression evaluation. – David Hull Mar 08 '13 at 22:35
1

I'm running Python 3.9

Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)]

and the and key word solves my problem

memo[v] = short_combo and short_combo.copy()

From what I gather this is not pythonic and you should handle the exception.
However in my solution None ambiguity exists within the function, and in this scenario I would think it to be a poor practice to handle exceptions that occur ~50% of the time.
Where I outside of the function and calling it I would handle the exception.

  • 1
    To anyone wondering, the reason this works is that the boolean operators in Python return one of the actual objects used as operators, without converting them to booleans. That is, `x and y` is `x` if `bool(x)` is `False`, `y` otherwise. Thus, if `x` is `None`, `x and x.attr` evaluates to `None` without an exception (due to short-circuiting). – Anakhand May 23 '22 at 12:12
0

Here is another potential technique, which hides the assignment of the intermediate value in a method call. First we define a class to hold the intermediate value:

class DataHolder(object):
    def __init__(self, value = None):
            self.v = value

    def g(self):
            return self.v

    def s(self, value):
            self.v = value
            return value

x = DataHolder(None)

Then we get use it to store the result of each link in the chain of calls:

import bs4;

for html in ('<html><head></head><body></body></html>',
             '<html><head><title>Foo</title></head><body></body></html>'):
    soup = bs4.BeautifulSoup(html)
    print x.s(soup.head) and x.s(x.g().title) and x.s(x.g().string)
    # or
    print x.s(soup.head) and x.s(x.v.title) and x.v.string

I don't consider this a good solution, but I'm including it here for completeness.

David Hull
  • 1,255
  • 1
  • 14
  • 17
0

This is how I handled it with inspiration from @TAS and Is there a Python library (or pattern) like Ruby's andand?

class Andand(object):
    def __init__(self, item=None):
        self.item = item

    def __getattr__(self, name):
        try:
            item = getattr(self.item, name)
            return item if name is 'item' else Andand(item)
        except AttributeError:
            return Andand()     

    def __call__(self):
        return self.item


title = Andand(soup).head.title.string()
Community
  • 1
  • 1
reubano
  • 5,087
  • 1
  • 42
  • 41
0

My best shoot to handle middle-way null attributes like this is to use pydash as sample code on repl.it here

import pydash
title = pydash.get(soup, 'head.title.string', None)
Nam G VU
  • 33,193
  • 69
  • 233
  • 372