2

I am parsing csv files to xml files using pandas.

This is how I am currently handling columns missing from my dataframe df:

if field in df.columns:
    ...
    # Assign a value to the xml element
    ...
else:
    xml_data.append('<{0}>{1}</{0}>'.format(field_name, -999))

So for instance, assuming that column (field) Diameter does not exist in my dataframe, the output xml data would contain the following element:

'<Diameter>-999</Diameter>'

Is there a better way of handling this? Does the xml format support NaN values?

Sheldon
  • 4,084
  • 3
  • 20
  • 41
  • 2
    You can represent a NaN as a Null value in different ways in XML. See related [question](https://stackoverflow.com/questions/774192/what-is-the-correct-way-to-represent-null-xml-elements). – CodeMonkey Jun 19 '21 at 01:00
  • Thanks for your answer, JasonM1. Since I am planning to validate my output xml file against a xml schema, I will go for the xsi:nil format, as suggested in the post that you recommended. – Sheldon Jun 19 '21 at 01:05
  • 2
    XML is a data transfer format and doesn't care about the values. What matters is what is expected by the code that will read the XML. Find out what that code expects for `NaN` or `null` and provide that. – Jim Garrison Jun 19 '21 at 01:13
  • 2
    Do not build XML with string. Use DOM libraries like `etree` and `lxml`. See [What's so bad about building XML with string concatenation?](https://stackoverflow.com/q/3034611/1422451). BTW - upcoming Pandas v1.3 will have a [`DataFrame.to_xml`](https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.to_xml.html). – Parfait Jun 19 '21 at 04:10
  • 1
    Thanks for your suggestion, Parfait! I am already using `lxml` to validate my file against a xsd schema. I will check out how to use this toolbox to write a xml file. – Sheldon Jun 21 '21 at 19:31

0 Answers0