1

I want to call a Beautiful Soup attributes (eg. class_, href, id) from a variable to use it in functions such as this one:

script

from bs4 import BeautifulSoup
data='<p class="story">xxx </p> <p id="2">yyy</p> <p class="story"> zzz</p>'

def removeAttrib(data, **kwarg):
    soup = BeautifulSoup(data, "html.parser")
    for x in soup.findAll(tag, kwargs):
        del x[???] # should be an equivalent of: del x["class"]

kwargs= {"class":"story"}
removeAttrib(data,"p",**kwargs )
print(soup)

expected result:

<p>xxx </p> <p id="2">yyy</p> <p> zzz</p>

MYGz solved the first issue using tag, argdict using a a dictionary as argument for the function. I then found in this question the **kwargs (to pass the dictionary key and value).

But I did not find the way for the del x["class"]. How to pass the "class" key? I tried using ckey=kwargs.keys() and then del x[ckey] but it did not work.

ps1: any idea why removeAttrib(data, "p", {"class": "story"}) doesn't work? Ps2: This is another topic than this (it's not a duplicate)

Community
  • 1
  • 1
JinSnow
  • 1,553
  • 4
  • 27
  • 49

2 Answers2

1

You can pass a dictionary instead:

from bs4 import BeautifulSoup
data='<p class="story">xxx </p> <p id="2">yyy</p> <p class="story"> zzz</p>'
soup = BeautifulSoup(data, "html.parser")

def removeAttrib(soup, tag, argdict):

    for x in soup.findAll(tag, argdict):
        x.decompose()

removeAttrib(soup, "p", {"class": "story"})
Mohammad Yusuf
  • 16,554
  • 10
  • 50
  • 78
  • It's the not the answer I was looking for: it doesn't tell me how to call an attribute in other cases ― though I don't find any other example. But I thank you very much for this elegant solution. I found in the doc the"You can access a tag’s attributes by treating the tag like a dictionary" and the other way to work with that, like using attrs (or class_ which is actually a shortcut) – JinSnow Jan 23 '17 at 21:21
  • I finally found the exemple I was looking for. `for x in soup.findAll(tag, argdict ): del x[*key]` Please look at my question edit. – JinSnow Jan 24 '17 at 21:59
  • @Guillaume You didn't put the full question. I updated the answer. That is how you delete the unwanted tags. – Mohammad Yusuf Jan 25 '17 at 03:22
  • thanks again! But decompose won't work because it destroys the tag and its contents. That's why I used the `del x["class"]`. But how to use it without using the word `class` ? That's the challenge. (I updated my question to focus on that point) – JinSnow Jan 26 '17 at 14:52
1

all credit to MYGz and commandlineluser

from bs4 import BeautifulSoup
data='<p class="story">xxx </p> <p id="2">yyy</p> <p class="story"> zzz</p>'


def removeAttrib(data, tag, kwargs):
    soup = BeautifulSoup(data, "html.parser")
    for x in soup.findAll(tag, kwargs):
        for key in kwargs:
            # print(key) #>>class           
            x.attrs.pop(key, None) # attrs: to access the actual dict 
            #del x[key] would work also but will throw a KeyError if no key

    print(soup)           
    return soup

data=removeAttrib(data,"p",{"class":"story"})
JinSnow
  • 1,553
  • 4
  • 27
  • 49