0

I have the below XML file that need to be converted to csv with "value" fields only. there are a few hundred possibilities that are different for each XML type/version.

I've tried using below code in python3 and getting error as "AttributeError: 'NoneType' object has no attribute 'text'".

import pandas as pd
import xml.etree.ElementTree as et
xtree = et.parse("data.xml")
xroot = xtree.getroot()
output=[]
for node in xroot:
   v = node.find("value").text
   output.append(v);

out_df = pd.DataFrame(output, columns = ["value"])
out_df.to_csv('output.csv')

my data.xml input is as below

<?xml version="1.0" encoding="UTF-8"?>
<records>
<record source="AJS/SHD" type="call">
<group name="General">
<field name="RecordType" value="RESGJG"/>
<field name="RecordTypeHEC" value="PY"/>
<field name="NodeID" value="rock.dsjjgds.cm"/>
<field name="SequenceNumber" value="7937973"/>
<field name="StartDate" value="20171049979"/>
<field name="EndDate" value="201704059739793"/>
<field name="CallDuration" value="973979i"/>
<field name="CauseForRecordClosing" value="normal"/>
</group>
<group name="SIP">
<field name="ICID" value="dshhkdhs"/>
<field name="CallID" value="sdidydakyd2133@10.10.10.1"/>
<field name="User-Agent" value="NotPresent"/>
<field name="Request-URI" value="sip:+47668384"/>
<field name="CalledPartyNumber" value="sip:+08779379972"/>
<field name="CallingPartyNumber" value="sip:+07073873772@10.0.0.1"/>
<field name="To" value="sip:+878379739"/>
<field name="From" value="sip:+937973962"/>
</group>
<group name="VPN">
<field name="VPN_NAME_B" value="blshahd"/>
<field name="VPN_Group_B" value="ctr"/>
<field name="B_ExtType" value="part"/>
<field name="B_ISDN" value="7973"/>
<field name="B_SIP" value="67367672"/>
<field name="B_PABXID" value="797397"/>
</group>
</record>
</records>

expected output is to export all "value" to csv file below.

RESGJG,PY,rock.dsjjgds.cm,7937973,20171049979,201704059739793,973979i,normal,dshhkdhs,sdidydakyd2133@10.10.10.1,NotPresent,sip:+47668384,sip:+08779379972,sip:+07073873772@10.0.0.1,sip:+878379739,sip:+937973962,blshahd,ctr,part,7973,67367672,797397

Please help, i have tried various python references, but no luck.

Maria628
  • 224
  • 3
  • 19
  • Please edit the question to show what you've tried, and show a specific roadblock you're running into with [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve). For more information, please see [How to Ask](https://stackoverflow.com/help/how-to-ask). – Andreas Oct 10 '19 at 03:33

2 Answers2

0

Try this ."values" is the column you want to filtered from xml and to csv.

import pandas as pd
import xml.etree.ElementTree as et 

xtree = et.parse("data.xml")
xroot = xtree.getroot()
output=[]

for node in xroot:
   values = node.find("values").text
   output.append(values);

out_df = pd.DataFrame(output, columns = ["values"])

out_df.to_csv('output.csv')

wenjia
  • 1
  • 2
  • Hi, welcome to SO, please do explain your answer with some code – Sammy J Oct 10 '19 at 03:45
  • Hi wenjia, thanks for help.. i got below error trying above code. Traceback (most recent call last): File "test.py", line 9, in values = node.find("values").text AttributeError: 'NoneType' object has no attribute 'text' – Maria628 Oct 10 '19 at 04:33
  • by the way, i am using python3, dont know if it matters or not. – Maria628 Oct 10 '19 at 04:36
  • Hi Sai, sorry you try change "values" to "value" – wenjia Oct 10 '19 at 07:08
  • Hi, yes I tried modifying values to value. but no luck. I am getting "AttributeError: 'NoneType' object has no attribute 'text'", I came to know from online forums, that it will throw this error if specified tag is not found... Not Sure how to resolve it. – Maria628 Oct 10 '19 at 07:40
0

Below

import pprint
import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<records>
<record source="AJS/SHD" type="call">
<group name="General">
<field name="RecordType" value="RESGJG"/>
<field name="RecordTypeHEC" value="PY"/>
<field name="NodeID" value="rock.dsjjgds.cm"/>
<field name="SequenceNumber" value="7937973"/>
<field name="StartDate" value="20171049979"/>
<field name="EndDate" value="201704059739793"/>
<field name="CallDuration" value="973979i"/>
<field name="CauseForRecordClosing" value="normal"/>
</group>
<group name="SIP">
<field name="ICID" value="dshhkdhs"/>
<field name="CallID" value="sdidydakyd2133@10.10.10.1"/>
<field name="User-Agent" value="NotPresent"/>
<field name="Request-URI" value="sip:+47668384"/>
<field name="CalledPartyNumber" value="sip:+08779379972"/>
<field name="CallingPartyNumber" value="sip:+07073873772@10.0.0.1"/>
<field name="To" value="sip:+878379739"/>
<field name="From" value="sip:+937973962"/>
</group>
<group name="VPN">
<field name="VPN_NAME_B" value="blshahd"/>
<field name="VPN_Group_B" value="ctr"/>
<field name="B_ExtType" value="part"/>
<field name="B_ISDN" value="7973"/>
<field name="B_SIP" value="67367672"/>
<field name="B_PABXID" value="797397"/>
</group>
</record>
</records>'''

data = []
root = ET.fromstring(xml)
groups = root.findall('.//group')
for group in groups:
  data.append([f.attrib['value'] for f in group.findall('./field')])
pprint.pprint(data)

output

[['RESGJG',
  'PY',
  'rock.dsjjgds.cm',
  '7937973',
  '20171049979',
  '201704059739793',
  '973979i',
  'normal'],
 ['dshhkdhs',
  'sdidydakyd2133@10.10.10.1',
  'NotPresent',
  'sip:+47668384',
  'sip:+08779379972',
  'sip:+07073873772@10.0.0.1',
  'sip:+878379739',
  'sip:+937973962'],
 ['blshahd', 'ctr', 'part', '7973', '67367672', '797397']]
balderman
  • 22,927
  • 7
  • 34
  • 52
  • Hi thanks a lot... though my requirement is almost fulfilled with this, may I know how I can put these values together in one line.. 'RESGJG', 'PY','rock.dsjjgds.cm', '7937973','20171049979','201704059739793', '973979i', 'normal','dshhkdhs','sdidydakyd2133@10.10.10.1', 'NotPresent', 'sip:+47668384', 'sip:+08779379972', 'sip:+07073873772@10.0.0.1', 'sip:+878379739', 'sip:+937973962','blshahd', 'ctr', 'part', '7973', '67367672', '797397'. – Maria628 Oct 10 '19 at 14:07
  • Not sure I understand what are you looking for... Currently the data is a list that contains 3 lists. Do you want to have one flat list? – balderman Oct 10 '19 at 14:09
  • @Sai See https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists OR replace `data.append` with `data.extend` – balderman Oct 10 '19 at 14:23