p tag with/without class and strings

Question

I'm new to python and I'm trying to understand BeautifulSoup.

I did this code it works but not the way I want:

for abc in soup.findAll(['p',{'a':re.compile('href="/download/*')}]):
    value=abc.text
    print value

The page multiple "blocks" as this one:

<div class="">
  <div class="ABC">
    <p>
      <a href="/download/1234/abcde/fghij">String1</a>
    </p>
    <p class="data">
      String2 <a href="/user/4649/abc">String3</a> String2 
    </p>
  </div>
  <img src="/img/abc.png" alt="String4" title="String5" />
</div>

I want to read all this "blocks" and convert to a dictionary(?): [Link'/download/1234/abcde/fghij', Name'String1', User'String3', alt'String4, title'String5']

With this I can search for Name and get the Link

nickie · Answer 1 · 2013-08-31T21:54:41.930

1

Try something like this:

for outer in soup.find_all("div", attrs={"class": ""}):
    a = outer.find("a")
    img = outer.find("img")
    entry = { "Link": a.get("href")
            , "Name": a.text
            , "User": outer.find("p", "data").find("a").text
            , "alt": img.get("alt")
            , "title": img.get("title")
            }
    print entry

This retrieves the things that you want and puts them in a dictionary.

edited Aug 31 '13 at 21:54

answered Aug 31 '13 at 17:09

nickie

5,608
2
23
37

thanks! worked! I didn't know I could do two consecutive finds. The correct is findAll and not find_all. – FernandoG Aug 31 '13 at 18:29
`find_all` works for newer versions of the soup, whereas `findAll` works in newer and previous, but maybe not in the future ones. Check [this question](http://stackoverflow.com/questions/12339323/beautifulsoup-findall-find-all). – nickie Sep 01 '13 at 10:46

p tag with/without class and strings

1 Answers1