0

I try to parse XML data in Python and struggle with extracting the values. The data looks like this:

[<generic:Obs>
<generic:ObsDimension value="2020-01-02"/>
<generic:ObsValue value="1.1193"/>
<generic:Attributes>
<generic:Value id="OBS_STATUS" value="A"/>
<generic:Value id="OBS_CONF" value="F"/>
</generic:Attributes>
</generic:Obs>, <generic:Obs>
<generic:ObsDimension value="2020-01-03"/>
<generic:ObsValue value="1.1147"/>
<generic:Attributes>
<generic:Value id="OBS_STATUS" value="A"/>
<generic:Value id="OBS_CONF" value="F"/>
</generic:Attributes>
</generic:Obs>]

I would like to create a Pandas DF with columns ['Date','Value']. A date should be the value out of <generic:ObsDimension value="2020-01-03"/>, Value out of <generic:ObsValue value="1.1147"/>. When I run the code:

soup = BeautifulSoup(response.text, 'xml')
dates = soup.find_all("ObsDimension")

I'm getting the result as:

[<generic:ObsDimension value="2020-01-02"/>,
 <generic:ObsDimension value="2020-01-03"/>,
 <generic:ObsDimension value="2020-01-06"/>,
 <generic:ObsDimension value="2020-01-07"/>,
 <generic:ObsDimension value="2020-01-08"/>,
 <generic:ObsDimension value="2020-01-09"/>]

But how can I get the date and the corresponding value?

Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
  • Does this answer your question? [How to parse XML and get instances of a particular node attribute?](https://stackoverflow.com/questions/1912434/how-to-parse-xml-and-get-instances-of-a-particular-node-attribute) – baduker Mar 25 '23 at 08:10

1 Answers1

0

Try:

import pandas as pd
from bs4 import BeautifulSoup

xml_doc = '''\
<data>
<generic:Obs>
<generic:ObsDimension value="2020-01-02"/>
<generic:ObsValue value="1.1193"/>
<generic:Attributes>
<generic:Value id="OBS_STATUS" value="A"/>
<generic:Value id="OBS_CONF" value="F"/>
</generic:Attributes>
</generic:Obs>

<generic:Obs>
<generic:ObsDimension value="2020-01-03"/>
<generic:ObsValue value="1.1147"/>
<generic:Attributes>
<generic:Value id="OBS_STATUS" value="A"/>
<generic:Value id="OBS_CONF" value="F"/>
</generic:Attributes>
</generic:Obs>

</data>'''

soup = BeautifulSoup(xml_doc, 'xml')

all_data = []
for obs in soup.select('Obs'):
    date = obs.ObsDimension['value']
    value = obs.ObsValue['value']
    all_data.append({'Date': date, 'Value': value})

df = pd.DataFrame(all_data)
print(df)

Prints:

         Date   Value
0  2020-01-02  1.1193
1  2020-01-03  1.1147
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91