0

I have an .xml file as the following:

<?xml version="1.0" encoding="UTF-8" ?>
  <Act  bind="a" >
    <Key frame="32" value="1" />
    <Key frame="62" value="8" />
    <Key frame="106" value="14" />    
  </>
  <Act  bind="b" >
    <Key frame="32" value="1" />
    <Key frame="62" value="30" />
    <Key frame="106" value="30" />    
  </>
  <Act  bind="c" >
    <Key frame="28" value="81" />
    <Key frame="58" value="78" />
    <Key frame="102" value="78" />    
  </>
</>

How can I write an script that saves these values in an excel file, where I have as many columns as binds and as many rows as frames. Also, frames are sorted ascending and if a bind doesn't have a value for a frame, 0 is inserted. Something like the following:

      a         b         c   
28    0         0         81
32    1         1         0
58    0         0         78
62    8         30        0
102   0         0         78
106   14        30        0

Any help is very much appreciated.

devpolo
  • 2,487
  • 3
  • 12
  • 28
user9805040
  • 73
  • 1
  • 9
  • 1
    What have you tried so far? – Gilles Quénot Mar 07 '23 at 11:12
  • @GillesQuénot I tried the answers for [this question](https://stackoverflow.com/questions/25276672/how-to-extract-xml-specific-value-fields-and-list-them). They are not exactly what I need, but can be used as a start point. However, I couldn't even get the values printed in one line. – user9805040 Mar 07 '23 at 13:31
  • BTW why one should get a negative reputation for asking a question that doesn't know how to solve?! – user9805040 Mar 07 '23 at 13:36
  • There's no search/try effort. You didn't provides what you have tried. SO is not a free coding service. See [how to Ask](https://stackoverflow.com/help/how-to-ask) – Gilles Quénot Mar 07 '23 at 16:29
  • Looks like a requirement from you(r) boss – Gilles Quénot Mar 07 '23 at 17:01
  • @GillesQuénot I get your point and sorry if you felt bad. My focus was on clarifying my problem as much as possible and avoiding confusing people with my failed tries. But next time I'll add them too. – user9805040 Mar 08 '23 at 09:38

1 Answers1

1

As mentioned by @GillesQuénot, this is really not the way to ask questions on SO, so please try to do better next time.

Nevertheless, it's an interesting enough an assignment, so I tried to tackle it. First off, your xml in the question is not well formed so I fixed it (see below) to what I believe it should be.

Once we're past that, it becomes a somewhat acrobatic task requiring the use of lxml, xpath, list comprehensions, f-strings and several pandas methods - but it's doable. [Note for those who may wonder - FWIW, I tried but failed to achieve the same thing using pandas.read_xml()].

So all together:

#your xml, fixed
source = """<?xml version="1.0" encoding="UTF-8"?>
<root>
   <Act bind="a">
      <Key frame="32" value="1" />
      <Key frame="62" value="8" />
      <Key frame="106" value="14" />
   </Act>
   <Act bind="b">
      <Key frame="32" value="1" />
      <Key frame="62" value="30" />
      <Key frame="106" value="30" />
   </Act>
   <Act bind="c">
      <Key frame="28" value="81" />
      <Key frame="58" value="78" />
      <Key frame="102" value="78" />
   </Act>
</root>
"""
from lxml import etree
doc = etree.XML(source.encode())
#you need to convert to int so it can be properly sorted later:
frames = [int(frm) for frm in doc.xpath('//Key/@frame')]
#find the unique values and sort them
uni_frames = sorted(set(frames))

#start preparing your dataframe
rows, cols = [],[]
for act in doc.xpath('//Act'):
    #get the column names
    cols.append(act.xpath('./@bind')[0])
    row = []
    for frame in uni_frames:
        #get the frame values
        val = f'.//Key[@frame={frame}]/@value'
        entry = act.xpath(val)[0] if act.xpath(val) else "0"
        row.append(entry)
    rows.append(row)

#now assemble the dataframe itself
df = pd.DataFrame(rows)
df = df.transpose()
#rename the columns
df.columns = cols
#finally, insert the first column
df.insert(loc = 0,
          column = 'frame',
          value = uni_frames)
df

Output:

  frame a   b   c
0   28  0   0   81
1   32  1   1   0
2   58  0   0   78
3   62  8   30  0
4   102 0   0   78
5   106 14  30  0

Now that you have the dataframe, you can use other pandas methods to save to Excel or to CSV, etc.

Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45