8

I have this code that fetches some text from a page using BeautifulSoup

soup= BeautifulSoup(html)
body = soup.find('div' , {'id':'body'})
print body

I would like to make this as a reusable function that takes in some htmltext and the tags to match it like the following

def parse(html, atrs):
 soup= BeautifulSoup(html)
 body = soup.find(atrs)
 return body

But if i make a call like this

    parse(htmlpage, ('div' , {'id':'body'}"))  or like

parse(htmlpage, ['div' , {'id':'body'}"])

I get only the div element, the body attribute seems to get ignored.

Is there a way to fix this?

scott
  • 83
  • 1
  • 1
  • 4

2 Answers2

8
def parse(html, *atrs):
 soup= BeautifulSoup(html)
 body = soup.find(*atrs)
 return body

And then:

parse(htmlpage, 'div', {'id':'body'})
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • 1
    Thanks for your answer, it worked. I didn't know that one could unpack lists using *, thought only dicts worked like that using *\*. – scott Apr 03 '10 at 12:37
  • @EliBendersky Great! But any idea how we could unpack a single dictionary key (such as "class" in `del tag["class"]`)? http://stackoverflow.com/questions/41792761/calling-and-using-an-attribute-stored-in-variable-using-beautifulsoup-4 – JinSnow Feb 02 '17 at 21:22
3

I think you just need to add an asterisk here:

body = soup.find(*atrs)

Without the asterisk you are passing a single parameter which is a tuple:

body = soup.find(('div' , {'id':'body'}))

With the asterisk the tuple is expanded out and the statement becomes equivalent to what you want:

body = soup.find('div' , {'id':'body'})

See this article for more information on using the *args notation, and the related **kwargs.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452