-1

I have a csv that I need to convert to XML using Python. I'm a novice python dev.

Example CSV data:

Amount,Code
CODE50,1246
CODE50,6290
CODE25,1077
CODE25,9790
CODE100,5319
CODE100,4988

Necessary output XML

<coupon-codes coupon-id="CODE50">
    <code>1246</code>
    <code>1246</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE25">
    <code>1077</code>
    <code>9790</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE100">
    <code>5319</code>
    <code>4988</code>
<coupon-codes/>

My guess is I have to use pandas to pull the csv in, use pandas.groupby to group the Amount column, then push this into an element/subelement to create the xml, then print/push to a xml file. I can't get the groupby to work and don't know how to then push that into the element, then populate the sub element.

  • 1) *ElementTree is giving me trouble* - please explain 2) What is the logic of the csv to xml conversion - please explain. – balderman Nov 15 '21 at 18:59
  • @balderman unsure I understand the question, I mean its giving me trouble in that I can't get it to work, using the scraps of different code i've pulled off the web. I've had success with direct csv to xml wherein each row is an element, but I can't figure out how to group by a column, and have that spit out the appropriate xml. – Jeremy Altman Nov 15 '21 at 19:07
  • share the code so we can see the *trouble* – balderman Nov 15 '21 at 19:08
  • - no need for any external lib. ElementTree is good enough here. See my answer. – balderman Nov 15 '21 at 19:22

4 Answers4

1

why not use df.to_xml()? This simple api will generate xml from the dataframe

Golden Lion
  • 3,840
  • 2
  • 26
  • 35
0
import xml.etree.ElementTree as ET
import pandas as pd

df = pd.read_csv('path/to/csvfile.csv')

root = ET.Element('root')
for coupon_id, df_group in df.groupby('Amount'):
    coupon_codes = ET.Element('coupon-codes', {'coupon-id': coupon_id})
    for coupon_code in df_group['Code']:
        code = ET.Element('code')
        code.text = str(coupon_code)
        coupon_codes.append(code)
    root.append(coupon_codes)

To convert it to a minified string, use this:

min_xml = ET.tostring(root, encoding='utf8')
print(min_xml)

Output:

<root><coupon-codes coupon-id="CODE100"><code>5319</code><code>4988</code></coupon-codes><coupon-codes coupon-id="CODE25"><code>1077</code><code>9790</code></coupon-codes><coupon-codes coupon-id="CODE50"><code>1246</code><code>6290</code></coupon-codes></root>

To convert it to a formatted string, use this:

ET.indent(root)
pretty_xml = ET.tostring(root, encoding='utf8')
print(pretty_xml)

Output:

<root>
  <coupon-codes coupon-id="CODE100">
    <code>5319</code>
    <code>4988</code>
  </coupon-codes>
  <coupon-codes coupon-id="CODE25">
    <code>1077</code>
    <code>9790</code>
  </coupon-codes>
  <coupon-codes coupon-id="CODE50">
    <code>1246</code>
    <code>6290</code>
  </coupon-codes>
</root>
0

ElementTree can do it without a trouble. See below

from collections import defaultdict
from xml.etree.ElementTree import Element, SubElement,ElementTree

data = defaultdict(list)

with open('in.txt') as f:
  next(f)
  for line in f:
    parts = line.split(',')
    data[parts[0]].append(parts[1].strip())

root = Element('root')
for k,v in data.items():
  sub = SubElement(root,'coupon-codes',attrib={'coupon-id':k})
  for vv in v:
    sub_v = SubElement(sub,'code')
    sub_v.text = vv

tree = ElementTree(root)
with open('out.xml', 'w') as f:
  tree.write(f, encoding='unicode')
balderman
  • 22,927
  • 7
  • 34
  • 52
0

I can't comment yet, but is better if you put the code that you used for trying to solve the problem.

But the way there is a very similar question here, simple-csv-to-xml-conversion-python

And here is a post for grouping by, group by/cumcount

With this information i achieved it, by doing this

df = pd.DataFrame(['CODE50',
    'CODE50',
    'CODE25',
    'CODE25',
    'CODE100',
    'CODE100'],columns=['code'])

df['amount'] = [1246,
    6290,
    1077,
    9790,
    5319,
    4988]

df['columns'] = df.groupby('code')['amount'].cumcount()
result = df.pivot(index='code', columns='columns')

def convert(df):
    str = ''
    for a in df.index:
        str += '\n'f"""<coupon-codes coupon-id="{a}">"""
        for b in df['amount'].loc[a]:
            str += '\n''       'f"<code>{b}</code>"
    return str
xml = convert(result)
print(xml)

result

<coupon-codes coupon-id="CODE100">
       <code>5319</code>
       <code>4988</code>
<coupon-codes coupon-id="CODE25">
       <code>1077</code>
       <code>9790</code>
<coupon-codes coupon-id="CODE50">
       <code>1246</code>
       <code>6290</code>
Javier Huerta
  • 118
  • 11
  • @user17242583 The output is the same, as the only difference is that is not sorted. I will check your answer as i haven't used xml.etree and change your output because the question has changed – Javier Huerta Nov 16 '21 at 12:38