Python convert CSV to XML with pandas groupby

Question

I have a csv that I need to convert to XML using Python. I'm a novice python dev.

Example CSV data:

Amount,Code
CODE50,1246
CODE50,6290
CODE25,1077
CODE25,9790
CODE100,5319
CODE100,4988

Necessary output XML

<coupon-codes coupon-id="CODE50">
    <code>1246</code>
    <code>1246</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE25">
    <code>1077</code>
    <code>9790</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE100">
    <code>5319</code>
    <code>4988</code>
<coupon-codes/>

My guess is I have to use pandas to pull the csv in, use pandas.groupby to group the Amount column, then push this into an element/subelement to create the xml, then print/push to a xml file. I can't get the groupby to work and don't know how to then push that into the element, then populate the sub element.

1) *ElementTree is giving me trouble* - please explain 2) What is the logic of the csv to xml conversion - please explain. — balderman, Nov 15 '21 at 18:59
@balderman unsure I understand the question, I mean its giving me trouble in that I can't get it to work, using the scraps of different code i've pulled off the web. I've had success with direct csv to xml wherein each row is an element, but I can't figure out how to group by a column, and have that spit out the appropriate xml. — Jeremy Altman, Nov 15 '21 at 19:07
- no need for any external lib. ElementTree is good enough here. See my answer. — balderman, Nov 15 '21 at 19:22

score 1 · Answer 1 · answered Nov 15 '21 at 22:33

1

why not use df.to_xml()? This simple api will generate xml from the dataframe

answered Nov 15 '21 at 22:33

Golden Lion

3,840
2
26
35

score 0 · Answer 2 · 2021-11-15T19:16:20.837

import xml.etree.ElementTree as ET
import pandas as pd

df = pd.read_csv('path/to/csvfile.csv')

root = ET.Element('root')
for coupon_id, df_group in df.groupby('Amount'):
    coupon_codes = ET.Element('coupon-codes', {'coupon-id': coupon_id})
    for coupon_code in df_group['Code']:
        code = ET.Element('code')
        code.text = str(coupon_code)
        coupon_codes.append(code)
    root.append(coupon_codes)

To convert it to a minified string, use this:

min_xml = ET.tostring(root, encoding='utf8')
print(min_xml)

Output:

<root><coupon-codes coupon-id="CODE100"><code>5319</code><code>4988</code></coupon-codes><coupon-codes coupon-id="CODE25"><code>1077</code><code>9790</code></coupon-codes><coupon-codes coupon-id="CODE50"><code>1246</code><code>6290</code></coupon-codes></root>

To convert it to a formatted string, use this:

ET.indent(root)
pretty_xml = ET.tostring(root, encoding='utf8')
print(pretty_xml)

Output:

<root>
  <coupon-codes coupon-id="CODE100">
    <code>5319</code>
    <code>4988</code>
  </coupon-codes>
  <coupon-codes coupon-id="CODE25">
    <code>1077</code>
    <code>9790</code>
  </coupon-codes>
  <coupon-codes coupon-id="CODE50">
    <code>1246</code>
    <code>6290</code>
  </coupon-codes>
</root>

balderman · Accepted Answer · 2021-11-15T19:48:05.700

0

ElementTree can do it without a trouble. See below

from collections import defaultdict
from xml.etree.ElementTree import Element, SubElement,ElementTree

data = defaultdict(list)

with open('in.txt') as f:
  next(f)
  for line in f:
    parts = line.split(',')
    data[parts[0]].append(parts[1].strip())

root = Element('root')
for k,v in data.items():
  sub = SubElement(root,'coupon-codes',attrib={'coupon-id':k})
  for vv in v:
    sub_v = SubElement(sub,'code')
    sub_v.text = vv

tree = ElementTree(root)
with open('out.xml', 'w') as f:
  tree.write(f, encoding='unicode')

edited Nov 15 '21 at 19:48

answered Nov 15 '21 at 19:14

balderman

22,927
7
34
52

this output is perfect! How do I then write that to a file? I'm trying just root.write('test.xml') but am getting an attributeError 'xml.etree.ElementTree.Element' object has no attribute 'write' – Jeremy Altman Nov 15 '21 at 19:31
Code was updated - the xml document is written to a file. – balderman Nov 15 '21 at 19:48
you're the best. Thanks! – Jeremy Altman Nov 15 '21 at 19:50

score 0 · Answer 4 · answered Nov 15 '21 at 19:52

I can't comment yet, but is better if you put the code that you used for trying to solve the problem.

But the way there is a very similar question here, simple-csv-to-xml-conversion-python

And here is a post for grouping by, group by/cumcount

With this information i achieved it, by doing this

df = pd.DataFrame(['CODE50',
    'CODE50',
    'CODE25',
    'CODE25',
    'CODE100',
    'CODE100'],columns=['code'])

df['amount'] = [1246,
    6290,
    1077,
    9790,
    5319,
    4988]

df['columns'] = df.groupby('code')['amount'].cumcount()
result = df.pivot(index='code', columns='columns')

def convert(df):
    str = ''
    for a in df.index:
        str += '\n'f"""<coupon-codes coupon-id="{a}">"""
        for b in df['amount'].loc[a]:
            str += '\n''       'f"<code>{b}</code>"
    return str
xml = convert(result)
print(xml)

result

<coupon-codes coupon-id="CODE100">
       <code>5319</code>
       <code>4988</code>
<coupon-codes coupon-id="CODE25">
       <code>1077</code>
       <code>9790</code>
<coupon-codes coupon-id="CODE50">
       <code>1246</code>
       <code>6290</code>

@user17242583 The output is the same, as the only difference is that is not sorted. I will check your answer as i haven't used xml.etree and change your output because the question has changed — Javier Huerta, Nov 16 '21 at 12:38

Python convert CSV to XML with pandas groupby

4 Answers4