0

There is this option when opening an xml file using Excel. You get prompted with the option as seen in the picture Here

It basically open that xml file in a table work and based on the analysis that I have done. It seems to do a pretty good job. This is how it looks after I opened an xml file using excel as a tabel form Here

My Question: I want to convert an Xml into a table from like that feature in Excel does it. Is that possible?

The reason I want this result, is that working with tables inside excel is really easy using libraries like pandas. However, I don’t want to go an open every xml file with excel, show the table and then save it again. It is not very time efficient

This is my XML file

   <?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<FINAL>
    <START id="ID0001" service_code="0x5196">
      <Docs Docs_type="START">
        <Rational>225196</Rational>
        <Qualify>6251960000A0DE</Qualify>
      </Docs>
      <Description num="1213f2312">The parameter</Description>
      <SetFile dg="" dg_id="">
        <SetData value="32" />
      </SetFile>
    </START>
    <START id="DG0003" service_code="0x517B">
      <Docs Docs_type="START">
        <Rational>23423</Rational>
        <Qualify>342342</Qualify>
      </Docs>
      <Description num="3423423f3423">The third</Description>
      <SetFile dg="" dg_id="">
        <FileX dg="" axis_pts="2" name="" num="" dg_id="" />
        <FileY unit="" axis_pts="20" name="TOOLS" text_id="23423" unit_id="" />
        <SetData x="E1" value="21259" />
        <SetData x="E2" value="0" />
      </SetFile>
    </START>
    <START id="ID0048" service_code="0x5198">
      <RawData rawdata_type="OPDATA">
        <Request>225198</Request>
        <Response>343243324234234</Response>
      </RawData>
      <Meaning text_id="434234234">The forth</Meaning>
      <ValueDataset unit="m" unit_id="FEDS">
        <FileX dg="kg" discrete="false" axis_pts="19" name="weight" text_id="SDF3" unit_id="SDGFDS" />
        <SetData xin="sdf" xax="233" value="323" />
        <SetData xin="123" xax="213" value="232" />
        <SetData xin="2321" xax="232" value="23" />
      </ValueDataset>
    </START>
</FINAL>
</ProjectData>
  • Do you want to convert XML into panda's dataframe? see https://stackoverflow.com/q/28259301/5184851 – see-king_of_knowledge Nov 13 '19 at 15:09
  • If you want to convert directly from XML to csv, see https://stackoverflow.com/a/52307498/5184851 – see-king_of_knowledge Nov 13 '19 at 15:20
  • what if I want to convert XML into pandas's dataframe but without decleraing the tags of the attributes . why so? becuase I have slighly different xml files and I dont want to change my code every time I want to convert an xml to pandas dataframe –  Nov 14 '19 at 08:00

1 Answers1

0

So let's say I have the following input.xml file:

<main>
    <item name="item1" image="a"></item>
    <item name="item2" image="b"></item>
    <item name="item3" image="c"></item>
    <item name="item4" image="d"></item>
</main>

You can use the following code:

import xml.etree.ElementTree as ET
import pandas as pd

tree  = ET.parse('input.xml')
tags = [ e.attrib for e in tree.getroot() ] 
df = pd.DataFrame(tags)

# df: 
#     name image
# 0  item1     a
# 1  item2     b
# 2  item3     c
# 3  item4     d

And this should be independent of the number of attributes in a given file.

To write to a simple CSV file from pandas, you can use the to_csv command. See documentation. If it is necessary to be an excel sheet, you can use to_excel, see here.

# Write to csv without the row names
df.to_csv('file_name.csv', index = False)

# Write to xlsx sheet without the row names
df.to_excel('file_name.xlsx', index=False)

UPDATE:

For your XML file, and based on your clarification in the comments, I suggest the following, where all elements in the first level in the tree will be rows, and every attribute or node text will be column:

def has_children(e):
    ''' Check if element, e, has children'''
    return(len(list(e)) > 0)

def has_attrib(e):
    ''' Check if element, e, has attributes'''
    return(len(e.attrib)>0)

def get_uniqe_key(mydict, key):
    ''' Generate unique key if already exists in mydict'''
    if key in mydict:
        while key in mydict:
            key = key + '*'

    return(key)

tree = ET.parse('input2.xml')

root = tree.getroot()

# Get first level:
lvl_one = list(root)

myList = [];
for e in lvl_one:

    mydict  = {}
    # Iterate over each node in level one element
    for node in e.iter():

        if (not has_children(node)) & (node.text != None):
            uniqe_key = get_uniqe_key(mydict, node.tag)
            mydict[uniqe_key] = node.text

        if has_attrib(node):
            for key in node.attrib:
                uniqe_key = get_uniqe_key(mydict, key)
                mydict[uniqe_key] = node.attrib[key]

    myList.append(mydict)

print(pd.DataFrame(myList))  

Notice in this code, I check if the column name exists for each key, and if it exists, I create a new column name by suffixing with '*'.

  • The code runs successfully however nothing is displayed –  Nov 14 '19 at 11:39
  • Did you try `print(df)`? – see-king_of_knowledge Nov 14 '19 at 20:24
  • i did but i asked for help on how to also put them in excel sheet –  Nov 17 '19 at 12:12
  • I tried using the code with my own xml code. I'm getting Empty DataFrame, Columns: [], Index: [0, 1, 2]. I have added my xml file in above. The goal is so it goes inside Start and print their elements and then it goes to "DATAS" and then prints its elements and the same with "DOS". In case the element has nothing leave it blank on the dataframe of just write none –  Nov 18 '19 at 14:49
  • A table will have columns and rows. For the XML given, how the dataframe should look like? Will all the information provided in the XML example be in a single row? – see-king_of_knowledge Nov 19 '19 at 18:31
  • The idea was to have columns of the elements like PData,RawData, Request and then in rows to have what they contain inside.So when RawData appears again the new value will be stored on a new row of that column –  Nov 19 '19 at 19:15
  • `PData` has two attributes `id` and `ce_code`. Is that two rows? or two columns? What about `CAN` and `ESerial`? – see-king_of_knowledge Nov 19 '19 at 19:21
  • two columns. i think is the bette approach –  Nov 19 '19 at 20:04
  • thank you a lot. Can you please help me modify your code so it prints Description,SetData, FileX, FileY and their attributes. I would really appreciate it –  Nov 23 '19 at 12:10