0

here I need to read XML data from URL (exchange rate list), output is dictionary...now I can get only first currency...tried with find_all but without success... Can somebody comment where I need to put for loop to read all values...

import bs4 as bs
import urllib.request

source urllib.request.urlopen('http://www.xxxy.hr/Downloads/PBZteclist.xml').read()
soup = bs.BeautifulSoup(source,'xml')

name = soup.find('Name').text
unit = soup.find('Unit').text
buyratecache = soup.find('BuyRateCache').text
buyrateforeign = soup.find('BuyRateForeign').text
meanrate = soup.find('MeanRate').text
sellrateforeign = soup.find('SellRateForeign').text
sellratecache = soup.find('SellRateCache').text


devize =  {'naziv_valute': '{}'.format(name),
           'jedinica': '{}'.format(unit),
           'kupovni': '{}'.format(buyratecache),
           'kupovni_strani': '{}'.format(buyrateforeign),
           'srednji': '{}'.format(meanrate),
           'prodajni_strani': '{}'.format(sellrateforeign),
           'prodajni': '{}'.format(sellratecache)}

print ("devize:",devize)

Example of XML:

<ExchRates>
    <ExchRate>
        <Bank>Privredna banka Zagreb</Bank>
        <CurrencyBase>HRK</CurrencyBase>
        <Date>12.01.2019.</Date>
        <Currency Code="036">
            <Name>AUD</Name>
            <Unit>1</Unit>
            <BuyRateCache>4,485390</BuyRateCache>
            <BuyRateForeign>4,530697</BuyRateForeign>
            <MeanRate>4,646869</MeanRate>
            <SellRateForeign>4,786275</SellRateForeign>
            <SellRateCache>4,834138</SellRateCache>
        </Currency>
        <Currency Code="124">
            <Name>CAD</Name>
            <Unit>1</Unit>
            <BuyRateCache>4,724225</BuyRateCache>
            <BuyRateForeign>4,771944</BuyRateForeign>
            <MeanRate>4,869331</MeanRate>
            <SellRateForeign>4,991064</SellRateForeign>
            <SellRateCache>5,040975</SellRateCache>
        </Currency>
        <Currency Code="203">
            <Name>CZK</Name>
            <Unit>1</Unit>
            <BuyRateCache>0,280057</BuyRateCache>
            <BuyRateForeign>0,284322</BuyRateForeign>
            <MeanRate>0,290124</MeanRate>
            <SellRateForeign>0,297377</SellRateForeign>
            <SellRateCache>0,300351</SellRateCache>
        </Currency>
        ...etc...
    </ExchRate>
</ExchRates>
Parfait
  • 104,375
  • 17
  • 94
  • 125
Damir Švegović
  • 27
  • 1
  • 1
  • 4

1 Answers1

0

Simply iterate through all Currency nodes (not the soup object) and even use a list comprehension to build a list of dictionaries:

soup = bs.BeautifulSoup(source, 'xml')

# ALL EXCHANGE RATE NODES
curency_nodes = soup.findAll('Currency')

# LIST OF DICTIONAIRES
devize_list = [{'naziv_valute': c.find('Name').text,
                'jedinica': c.find('Unit').text,
                'kupovni': c.find('BuyRateCache').text,
                'kupovni_strani': c.find('BuyRateForeign').text,
                'srednji': c.find('MeanRate').text,
                'prodajni_strani': c.find('SellRateForeign').text,
                'prodajni': c.find('SellRateCache').text
               } for c in curency_nodes]

Alternatively, incorporate a dictionary comprehension since you are extracting all elements:

devize_list = [{n.name: n.text} for c in currency_nodes \
                                    for n in c.children if n.name is not None ]
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I tried to run this and print with 'print(next((item for item in devize_list if item["naziv_valute"] == "AUD"), "False Choice"))'....it returns value of first dictionary in list...so there is only first dictionary...can you help with that ? – Damir Švegović Jan 14 '19 at 19:53
  • Did this solution code raise any error for parsing with `bs4`? See edit to `print` content. – Parfait Jan 14 '19 at 20:21
  • No I have no errors..only one dictionary in list...{'naziv_valute': 'AUD', 'jedinica': '1', 'kupovni': '4,501087', 'kupovni_strani': '4,546553', 'srednji': '4,663131', 'prodajni_strani': '4,803025', 'prodajni': '4,851055'} – Damir Švegović Jan 14 '19 at 20:37
  • See edit. I misread your XML as it is not indented properly. *Currency* nodes repeats not *Exch_Rate*. – Parfait Jan 14 '19 at 21:13
  • not working... this is URL with xml... http://www.pbz.hr/Downloads/PBZteclist.xml – Damir Švegović Jan 14 '19 at 21:53
  • Hmmm...works great on my end. See [PyFiddle demo](https://pyfiddle.io/fiddle/d5372f68-e6bc-4601-956d-ad166569342e/?m=Saved%20fiddle) (be sure to slick `Run`). Please note two versions of parsing are show here. The nested dictionary comprehension does not rename keys to your special ones. – Parfait Jan 14 '19 at 22:18
  • what is the right way to search list of dics by naziv_valute ? I tried to follow example from [https://stackoverflow.com/questions/8653516/python-list-of-dictionaries-search] but returns errors ... print(next(item for item in devize_list if item["naziv_valute"] == "AUD")) – Damir Švegović Jan 15 '19 at 07:43
  • Hmmm.. that line works great on my end. See [PyFiddle demo](https://pyfiddle.io/fiddle/d5372f68-e6bc-4601-956d-ad166569342e/?m=Saved%20fiddle) – Parfait Jan 15 '19 at 15:04
  • yes, I see...Traceback (most recent call last): File "D:/Users/Damir/Dropbox/Dokumenti/Dora i Marko/Dora skolski zadaci/Python/zadaci/primjer_html_XML_scrapping_4_dictionary_works_OK.py", line 43, in print(next(item for item in devize_list if item["naziv_valute"] == "AUD")) File "D:/Users/Damir/Dropbox/Dokumenti/Dora i Marko/Dora skolski zadaci/Python/zadaci/primjer_html_XML_scrapping_4_dictionary_works_OK.py", line 43, in print(next(item for item in devize_list if item["naziv_valute"] == "AUD")) KeyError: 'naziv_valute' – Damir Švegović Jan 15 '19 at 22:06