Python beautiful soup arguments

Question

I have this code that fetches some text from a page using BeautifulSoup

soup= BeautifulSoup(html)
body = soup.find('div' , {'id':'body'})
print body

I would like to make this as a reusable function that takes in some htmltext and the tags to match it like the following

def parse(html, atrs):
 soup= BeautifulSoup(html)
 body = soup.find(atrs)
 return body

But if i make a call like this

    parse(htmlpage, ('div' , {'id':'body'}"))  or like

parse(htmlpage, ['div' , {'id':'body'}"])

I get only the div element, the body attribute seems to get ignored.

Is there a way to fix this?

score 8 · Accepted Answer · answered Apr 03 '10 at 12:29

8

def parse(html, *atrs):
 soup= BeautifulSoup(html)
 body = soup.find(*atrs)
 return body

And then:

parse(htmlpage, 'div', {'id':'body'})

answered Apr 03 '10 at 12:29

Eli Bendersky

1

Thanks for your answer, it worked. I didn't know that one could unpack lists using *, thought only dicts worked like that using *\*. – scott Apr 03 '10 at 12:37
@EliBendersky Great! But any idea how we could unpack a single dictionary key (such as "class" in `del tag["class"]`)? http://stackoverflow.com/questions/41792761/calling-and-using-an-attribute-stored-in-variable-using-beautifulsoup-4 – JinSnow Feb 02 '17 at 21:22

score 3 · Answer 2 · answered Apr 03 '10 at 12:29

3

I think you just need to add an asterisk here:

body = soup.find(*atrs)

Without the asterisk you are passing a single parameter which is a tuple:

body = soup.find(('div' , {'id':'body'}))

With the asterisk the tuple is expanded out and the statement becomes equivalent to what you want:

body = soup.find('div' , {'id':'body'})

See this article for more information on using the *args notation, and the related **kwargs.

answered Apr 03 '10 at 12:29

Mark Byers

1

++, this is a nice alternative. – Eli Bendersky Apr 03 '10 at 12:35
Thanks for the link, i'm reading it right now. btw I had to add two asterisks on both the parameter list and in the soup.find place. – scott Apr 03 '10 at 12:44

2 Answers2