Web Scraping using BeautifulSoup [alt] [python]

Question

I am given a HTML code, a part which particularly interests me looks like this:

<a href="/go/wydarzenia/rozrywka/35826-majowka-w-twierdzy-klodzko? 
termin=265036" class="link with-img"> <img 
src="/go/resources/main/img//download/img- 
14ab4e372df7bd0826c90f429f0e5933/twierdza-przewodnik-jpg.jpg" alt="Majówka w 
Twierdzy Kłodzko" class=""/>

I know that it looks a bit messy but I have to deal with it anyhow.
My job is to extract the text after alt.
So in the code above the output should look like this

>> Majówka w Twierdzy Kłodzko

I read here a lot of useful information about extracting data by searching CSS classes or HTML tags. However I didn't find anything anout alt. I would appreciate any help.

Here's my code after some changes

import requests
from bs4 import BeautifulSoup
url = 'https://www.wroclaw.pl/go/wydarzenia/rozrywka/eventy'
soup = BeautifulSoup(requests.get(url).content, "html.parser")
print(soup.a.img.attrs["alt"])

And the output says that: AttributeError: 'NoneType' object has no attribute 'attrs'
What am I doing wrong?

score 1 · Answer 1 · answered May 06 '18 at 16:07

1

Use attribute key to get the required value.

Ex:

from bs4 import BeautifulSoup
s = """<a href="/go/wydarzenia/rozrywka/35826-majowka-w-twierdzy-klodzko? 
termin=265036" class="link with-img"> <img 
src="/go/resources/main/img//download/img- 
14ab4e372df7bd0826c90f429f0e5933/twierdza-przewodnik-jpg.jpg" alt="Majówka w 
Twierdzy Kłodzko" class=""/>"""
soup = BeautifulSoup(s, "html.parser")
print(soup.a.img["alt"])    #or print(soup.a.img.attrs["alt"])

Output:

Majówka w Twierdzy Kłodzko

answered May 06 '18 at 16:07

Rakesh

81,458
17
76
113

Thank you! It truerly works. I know that it's not the topic but I need to download the whole page not only our part $s$. What is the best way to do it? Using **urllib3**? – Hendrra May 06 '18 at 16:14
1

You are welcome :). You can use `urllib3` or `requests` modules – Rakesh May 06 '18 at 16:15
Ok. Thanks again! I don't know if I understand it - so **BeautifulSoup** needs a string, doesn't it? – Hendrra May 06 '18 at 16:16
This link should help you. https://stackoverflow.com/questions/39757805/using-python-requests-and-beautiful-soup-to-pull-text They use `requests` with `beautifulsoup` – Rakesh May 06 '18 at 16:19
Thanks. It did help, however my code still does not work. I would appreciate your looking at it. – Hendrra May 06 '18 at 16:32

score 1 · Answer 2 · answered May 06 '18 at 16:09

You can use BeautifulSoup:

from bs4 import BeautifulSoup as soup
s = '<a href="/go/wydarzenia/rozrywka/35826-majowka-w-twierdzy-klodzko? termin=265036" class="link with-img"> <img src="/go/resources/main/img//download/img- 14ab4e372df7bd0826c90f429f0e5933/twierdza-przewodnik-jpg.jpg" alt="Majówka w Twierdzy Kłodzko" class=""/>'
alt = soup(s, 'lxml').find('img')['alt']

Output:

u'Maj\xf3wka w Twierdzy K\u0142odzko'

Web Scraping using BeautifulSoup [alt] [python]

2 Answers2