1

I'm struggling to have queries in BS with multiple conditions of the type AND or OR. From what I read I have to use lambda. As an example, I'm looking for tags that match either "span", {"class":"green"} or tag.name == "h1" on the page http://www.pythonscraping.com/pages/warandpeace.html

I manage to get them separately using the lambda syntax:
bsObj.findAll(lambda tag: tag.name == "h1") will return h1
bsObj.findAll(lambda tag: tag.name == "span", {"class":"green"}) wil return span green

Or I can get all "span" tags and "h1" :
bsObj.findAll(lambda tag: tag.name == "span" or tag.name == "h1") return span green and red as well as h1

But I don't manage to get the span of class green or h1, as the following code does not provide the right result :
bsObj.findAll(lambda tag: tag.name == "span", {"class":"green"} or tag.name == "h1")

Can please someone explain me the right way to do it in one query ? Goal here is not only to get the result but rather understand the syntax. Thanks !

(using Python 3.4)
PS : I think this issue is different from the the one here: BeautifulSoup findAll() given multiple classes? as well as a variation of Python BeautifulSoup give multiple tags to findAll (as we want a specific attribute)

Community
  • 1
  • 1
PLL
  • 55
  • 6

1 Answers1

0

You can access attributes using the tag['<attr_name>'] syntax. Check tag.attrs to see what this dictionary contains exactly. Anyway, you might be able to search for green using the class attribute. Since it is a multi-valued attribute you might need to use:

'green' in tag['class']

And for your lambda construct, you should make use of and as well as or:

lambda t: (t.name == 'span' and 'green' in t.get('class',[])) or t.name == 'h1'
301_Moved_Permanently
  • 4,007
  • 14
  • 28
  • Hi Matthias and many thanks for your answer. Can you please provide the full syntax? I've tried: `bsObj.findAll(lambda t: (t.name == "span" and t['class'] == u'green') or t.name == "h1")` and it only returns h1. – PLL Sep 17 '15 at 09:08
  • You need to check first if `tag['class']` returns a single value or a list. According to the standard it should return a list but it might vary. Read the [doc on attributes](http://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes) and multi-values attributes do decide whether form suits your needs. – 301_Moved_Permanently Sep 17 '15 at 09:15
  • Sorry I'm a beginner I just don't get it. Would be great if you can provide a working syntax maybe I can understand then... – PLL Sep 17 '15 at 10:05
  • What does `for t in bsObj.findAll(name='span'): print(t['class'])` outputs? Feel free to share only a few lines. – 301_Moved_Permanently Sep 17 '15 at 10:09
  • It uses the multi-valued attributes syntax, then. Answer updated. – 301_Moved_Permanently Sep 17 '15 at 10:16