How to get data with BeautifulSoup

Question

I would like to scrape data from one web page. My code looks like this:

grad = s.get('https://www.njuskalo.hr/prodaja-kuca/zagreb',headers=header, proxies=proxyDict)
city_soup = BeautifulSoup(grad.text, "lxml")
kvarts = city_soup.find_all(id="locationId_level_1")
print kvarts[0]
print "++++++++++++++++++++++="

for kvart in kvarts[0]:
    print kvart

As result I get:

<option data-url-alias="/brezovica" value="1247">Brezovica</option>
<option data-url-alias="/crnomerec" value="1248">Črnomerec</option>
<option data-url-alias="/donja-dubrava" value="1249">Donja Dubrava</option>

From there I need to extract data-url-alias and value. How to do that?

See the [documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes). — Galen, Dec 22 '17 at 08:47

t.m.adam · Answer 1 · 2017-12-22T09:22:24.563

6

bs4 stores tag attributes in a dictionary so you can select them by name.

for kvart in kvarts[0].find_all('option'):
    print kvart['data-url-alias'], kvart['value']

As mentioned by Evyatar Meged in the comments this will raise a KeyError if a key doesn't exist, so if you're not sure about it use the .get method.

for kvart in kvarts[0].find_all('option'):
    print kvart.get('data-url-alias'), kvart.get('value')

dict.get returns None if a key doesn't exist (or you can set a default value)

edited Dec 22 '17 at 09:22

answered Dec 22 '17 at 08:53

t.m.adam

15,106
3
32
52

I think it is better if he used `.get` to avoid `KeyError` being raised. still +1 – Evya Dec 22 '17 at 08:54
@EvyatarMeged you're absolutely right, i'll update. – t.m.adam Dec 22 '17 at 08:56
I knew you would definitely come up with any explanation. So, I was heading to the right direction as well, right? Thank you so much. – robots.txt Sep 07 '19 at 17:46
@robots.txt I think this is one of those situations where selenium may be the most reliable option. I hope I can help more the next time. Cheers! – t.m.adam Sep 07 '19 at 17:52

How to get data with BeautifulSoup

1 Answers1