Test if an attribute is present in a tag in BeautifulSoup

Question

I would like to get all the <script> tags in a document and then process each one based on the presence (or absence) of certain attributes.

E.g., for each <script> tag, if the attribute for is present do something; else if the attribute bar is present do something else.

Here is what I am doing currently:

outputDoc = BeautifulSoup(''.join(output))
scriptTags = outputDoc.findAll('script', attrs = {'for' : True})

But this way I filter all the <script> tags with the for attribute... but I lost the other ones (those without the for attribute).

"but the if ... in doesn't work"? What does that mean? Syntax error? What do you mean by "doesn't work"? Please be very specific on what's going wrong. — S.Lott, Feb 16 '11 at 11:03
Do you want to test for the presence of an attribute in _any_ tag, _all_ tags or treat each occurrence of the tag separately? — Chinmay Kanchi, Feb 16 '11 at 12:42

score 145 · Accepted Answer · edited May 31 '15 at 22:52

145

If i understand well, you just want all the script tags, and then check for some attributes in them?

scriptTags = outputDoc.findAll('script')
for script in scriptTags:
    if script.has_attr('some_attribute'):
        do_something()

edited May 31 '15 at 22:52

Sadık

4,249
7
53
89

answered Feb 16 '11 at 14:15

Lucas S.

13,391
8
46
46

i'm unable to do something like: if 'some_attribute' in script ? , that's what I'm after, and I want to avoid calling findAll again and again... – LB40 Feb 16 '11 at 14:20
5

For checking for available attributes you must use python dict methods, eg: script.has_key('some_attribute') – Lucas S. Feb 16 '11 at 14:29
1

how do I check if the tag has any attributes? While tag.has_key('some_attribute') works fine, tag.keys() throws an exception ('NoneType' object is not callable). – Georg Pfolz Apr 08 '13 at 14:00
1

found it: tag.attrs is the dictionary! – Georg Pfolz Apr 08 '13 at 14:18
12

Please update this post, has_key is deprecated. Use has_attr instead. – RvdK Mar 31 '14 at 15:02
3

sadly, did not work for me. Maybe this way `soup_response.find('err').string is not None` can be used for other attributes too... – im_infamous Aug 25 '18 at 14:03

score 47 · Answer 2 · edited Oct 31 '19 at 07:10

You don't need any lambdas to filter by attribute, you can simply use some_attribute=True in find or find_all.

script_tags = soup.find_all('script', some_attribute=True)

# or

script_tags = soup.find_all('script', {"some-data-attribute": True})

Here are more examples with other approaches as well:

soup = bs4.BeautifulSoup(html)

# Find all with a specific attribute

tags = soup.find_all(src=True)
tags = soup.select("[src]")

# Find all meta with either name or http-equiv attribute.

soup.select("meta[name],meta[http-equiv]")

# find any tags with any name or source attribute.

soup.select("[name], [src]")

# find first/any script with a src attribute.

tag = soup.find('script', src=True)
tag = soup.select_one("script[src]")

# find all tags with a name attribute beginning with foo
# or any src beginning with /path
soup.select("[name^=foo], [src^=/path]")

# find all tags with a name attribute that contains foo
# or any src containing with whatever
soup.select("[name*=foo], [src*=whatever]")

# find all tags with a name attribute that endwith foo
# or any src that ends with  whatever
soup.select("[name$=foo], [src$=whatever]")

You can also use regular expressions with find or find_all:

import re
# starting with
soup.find_all("script", src=re.compile("^whatever"))
# contains
soup.find_all("script", src=re.compile("whatever"))
# ends with 
soup.find_all("script", src=re.compile("whatever$"))

I agree that this should be the accepted answer. I simplified the primary example to make it stand out more. — mihow, Oct 30 '19 at 22:05

miah · Answer 3 · 2021-01-26T02:45:00.720

38

For future reference, has_key has been deprecated is beautifulsoup 4. Now you need to use has_attr

scriptTags = outputDoc.find_all('script')
  for script in scriptTags:
    if script.has_attr('some_attribute'):
      do_something()

edited Jan 26 '21 at 02:45

answered Aug 01 '13 at 02:32

miah

10,093
3
21
32

score 20 · Answer 4 · answered Jul 26 '15 at 16:51

20

If you only need to get tag(s) with attribute(s), you can use lambda:

soup = bs4.BeautifulSoup(YOUR_CONTENT)

Tags with attribute

tags = soup.find_all(lambda tag: 'src' in tag.attrs)

OR

tags = soup.find_all(lambda tag: tag.has_attr('src'))

Specific tag with attribute

tag = soup.find(lambda tag: tag.name == 'script' and 'src' in tag.attrs)

Etc ...

Thought it might be useful.

answered Jul 26 '15 at 16:51

SomeGuest

209
2
2

2

Elegant solutions! – Andor Jun 03 '16 at 14:06

score 3 · Answer 5 · answered Jan 03 '18 at 00:04

3

you can check if some attribute are present

scriptTags = outputDoc.findAll('script', some_attribute=True)
for script in scriptTags:
    do_something()

answered Jan 03 '18 at 00:04

Charles Ma

31
1

score 1 · Answer 6 · answered Sep 20 '16 at 15:28

By using the pprint module you can examine the contents of an element.

from pprint import pprint

pprint(vars(element))

Using this on a bs4 element will print something similar to this:

{'attrs': {u'class': [u'pie-productname', u'size-3', u'name', u'global-name']},
 'can_be_empty_element': False,
 'contents': [u'\n\t\t\t\tNESNA\n\t'],
 'hidden': False,
 'name': u'span',
 'namespace': None,
 'next_element': u'\n\t\t\t\tNESNA\n\t',
 'next_sibling': u'\n',
 'parent': <h1 class="pie-compoundheader" itemprop="name">\n<span class="pie-description">Bedside table</span>\n<span class="pie-productname size-3 name global-name">\n\t\t\t\tNESNA\n\t</span>\n</h1>,
 'parser_class': <class 'bs4.BeautifulSoup'>,
 'prefix': None,
 'previous_element': u'\n',
 'previous_sibling': u'\n'}

To access an attribute - lets say the class list - use the following:

class_list = element.attrs.get('class', [])

You can filter elements using this approach:

for script in soup.find_all('script'):
    if script.attrs.get('for'):
        # ... Has 'for' attr
    elif "myClass" in script.attrs.get('class', []):
        # ... Has class "myClass"
    else: 
        # ... Do something else

score 1 · Answer 7 · answered Aug 07 '21 at 19:09

1

A simple way to select just what you need.

outputDoc.select("script[for]")

answered Aug 07 '21 at 19:09

Eat at Joes

4,937
1
40
40

Test if an attribute is present in a tag in BeautifulSoup

7 Answers7

Linked