-2

I am given a HTML code, a part which particularly interests me looks like this:

<a href="/go/wydarzenia/rozrywka/35826-majowka-w-twierdzy-klodzko? 
termin=265036" class="link with-img"> <img 
src="/go/resources/main/img//download/img- 
14ab4e372df7bd0826c90f429f0e5933/twierdza-przewodnik-jpg.jpg" alt="Majówka w 
Twierdzy Kłodzko" class=""/>

I know that it looks a bit messy but I have to deal with it anyhow.
My job is to extract the text after alt.
So in the code above the output should look like this

>> Majówka w Twierdzy Kłodzko

I read here a lot of useful information about extracting data by searching CSS classes or HTML tags. However I didn't find anything anout alt. I would appreciate any help.


Here's my code after some changes

import requests
from bs4 import BeautifulSoup
url = 'https://www.wroclaw.pl/go/wydarzenia/rozrywka/eventy'
soup = BeautifulSoup(requests.get(url).content, "html.parser")
print(soup.a.img.attrs["alt"])

And the output says that: AttributeError: 'NoneType' object has no attribute 'attrs'
What am I doing wrong?

Hendrra
  • 682
  • 1
  • 8
  • 19

2 Answers2

1

Use attribute key to get the required value.

Ex:

from bs4 import BeautifulSoup
s = """<a href="/go/wydarzenia/rozrywka/35826-majowka-w-twierdzy-klodzko? 
termin=265036" class="link with-img"> <img 
src="/go/resources/main/img//download/img- 
14ab4e372df7bd0826c90f429f0e5933/twierdza-przewodnik-jpg.jpg" alt="Majówka w 
Twierdzy Kłodzko" class=""/>"""
soup = BeautifulSoup(s, "html.parser")
print(soup.a.img["alt"])    #or print(soup.a.img.attrs["alt"])

Output:

Majówka w Twierdzy Kłodzko
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • Thank you! It truerly works. I know that it's not the topic but I need to download the whole page not only our part $s$. What is the best way to do it? Using **urllib3**? – Hendrra May 06 '18 at 16:14
  • 1
    You are welcome :). You can use `urllib3` or `requests` modules – Rakesh May 06 '18 at 16:15
  • Ok. Thanks again! I don't know if I understand it - so **BeautifulSoup** needs a string, doesn't it? – Hendrra May 06 '18 at 16:16
  • This link should help you. https://stackoverflow.com/questions/39757805/using-python-requests-and-beautiful-soup-to-pull-text They use `requests` with `beautifulsoup` – Rakesh May 06 '18 at 16:19
  • Thanks. It did help, however my code still does not work. I would appreciate your looking at it. – Hendrra May 06 '18 at 16:32
1

You can use BeautifulSoup:

from bs4 import BeautifulSoup as soup
s = '<a href="/go/wydarzenia/rozrywka/35826-majowka-w-twierdzy-klodzko? termin=265036" class="link with-img"> <img src="/go/resources/main/img//download/img- 14ab4e372df7bd0826c90f429f0e5933/twierdza-przewodnik-jpg.jpg" alt="Majówka w Twierdzy Kłodzko" class=""/>'
alt = soup(s, 'lxml').find('img')['alt']

Output:

u'Maj\xf3wka w Twierdzy K\u0142odzko'
Ajax1234
  • 69,937
  • 8
  • 61
  • 102