1

My name is Pablo, and this is my first question in this group. After checking others related posts, I´ve decided to make a request, I wonder if there is a way to perform the following.

Let´s suppose I´ve the following dataframe structure:

+----+---------+------------+------------+----------+
|    |   MRBTS | dest       | gw         |   length |
|----+---------+------------+------------+----------|
|  0 |   13004 | 10.104.0.0 | 10.48.0.0  |       16 |
|  1 |   13004 | 10.107.0.0 | 10.45.0.0  |       16 |
|  2 |   13005 | 10.104.0.0 | 10.130.0.0 |        8 |
|  3 |   13005 | 10.102.0.0 | 10.130.0.0 |        8 |
|  4 |   13005 | 0.0.0.0    | 10.110.0.0 |       16 |
+----+---------+------------+------------+----------+

Test DF:

and I want to export into an XML list groupping by MRBTS like following:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
  <cmData type="plan" scope="all" name="iprt" id="PlanConfiguration( 7152069 )">
    <header>
      <log dateTime="2020-06-19T07:38:16.000-03:00" action="created" appInfo="PlanExporter">InternalValues are used</log>
    </header>
    <managedObject distName="MRBTS-13004">
      <list >
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="length">16</p>
          <p name="gw">10.38.0.0</p>
        </item>
        <item>
          <p name="dest">10.107.0.0</p>
          <p name="length">16</p>
          <p name="gw">10.45.0.0</p>
        </item>
      </list>
    </managedObject>
    <managedObject  distName="MRBTS-13005">
      <list >
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="length">8</p>
          <p name="gw">10.130.8.0</p>
        </item>
        <item>
          <p name="dest">10.102.0.0</p>
          <p name="length">8</p>
          <p name="gw">10.130.8.0</p>
        </item>
        <item>
          <p name="dest">0.0.0.0</p>
          <p name="length">16</p>
          <p name="gw">10.110.0.0</p>
        </item>
      </list>
    </managedObject>
  </cmData>
</raml>

I get this code from another post (How do convert a pandas/dataframe to XML?), but I got stucked while trying to grouping by MRBTS:

import pandas as pd
df = pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
                   'dest':['10.104.0.0','10.107.0.0','10.104.0.0','10.102.0.0','0.0.0.0'],
                   'gw':['10.48.0.0','10.45.0.0','10.130.0.0','10.130.0.0','10.110.0.0'],
                   'length':['16','16','8','8','16']})

def func(row):
    xml = ['<list >']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</list>')
    return '\n'.join(xml)



print ('\n'.join(df.apply(func, axis=1)))

And this result:

<list >
  <field name="MRBTS">13004</field>
  <field name="dest">10.104.0.0</field>
  <field name="gw">10.48.0.0</field>
  <field name="length">16</field>
</list>
<list >
  <field name="MRBTS">13004</field>
  <field name="dest">10.107.0.0</field>
  <field name="gw">10.45.0.0</field>
  <field name="length">16</field>
</list>
<list >
  <field name="MRBTS">13005</field>
  <field name="dest">10.104.0.0</field>
  <field name="gw">10.130.0.0</field>
  <field name="length">8</field>
</list>
<list >
  <field name="MRBTS">13005</field>
  <field name="dest">10.102.0.0</field>
  <field name="gw">10.130.0.0</field>
  <field name="length">8</field>
</list>
<list >
  <field name="MRBTS">13005</field>
  <field name="dest">0.0.0.0</field>
  <field name="gw">10.110.0.0</field>
  <field name="length">16</field>
</list>

Could you help me with this issue?

Zephyr
  • 11,891
  • 53
  • 45
  • 80

2 Answers2

0

I see key as structuring your data better first for your target xml representation.

  1. groupby MRBTS
  2. custom aggregation to return list of items for attribute list aggregation. I've used a bit of shorthand list comprehension to prepare kwargs that go to agg()
  3. the JSON / dict you now get from this Dataframe is structure equivalently to your target requirement
  4. I'm rusty on XML, haven't done any for 15 years. There may be better libraries for converting JSON to XML. This shows the structure is mostly there. A bit of XSLT would get you there quite easily
  5. I've done a bit of experimentation restructuring dict from pandas.to_dict() before it is sent through json2xml
from json2xml import json2xml
df = pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
                   'dest':['10.104.0.0','10.107.0.0','10.104.0.0','10.102.0.0','0.0.0.0'],
                   'gw':['10.48.0.0','10.45.0.0','10.130.0.0','10.130.0.0','10.110.0.0'],
                   'length':['16','16','8','8','16']})

a = {c:lambda x: list(x) for c in df.columns if c!="MRBTS"}
df2 = df.groupby("MRBTS").agg(a).reset_index()


js = [{"distName":mo["MRBTS"], "item":[{"dest":mi, "length":mo["length"][i], "gw":mo["gw"][i]} 
                                       for i, mi in enumerate(mo["dest"])]} 
      for mo in df2.to_dict(orient="records")]
print(json2xml.Json2xml(js, attr_type=False).to_xml())
print(json2xml.Json2xml(df2.to_dict(orient="records"), attr_type=False).to_xml())


output (just first record)

<?xml version="1.0" ?>
<all>
    <item>
        <distName>13004</distName>
        <item>
            <item>
                <dest>10.104.0.0</dest>
                <length>16</length>
                <gw>10.48.0.0</gw>
            </item>
            <item>
                <dest>10.107.0.0</dest>
                <length>16</length>
                <gw>10.45.0.0</gw>
            </item>
        </item>
    </item>
</all>

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
0

Since XML documents are not text documents, avoid building XML with string concatenation. Instead consider building tree with DOM methods using third-party lxml or even built-in module, etree (with slight modifications). For data, iterate through subsets of data frame by MRBTS field:

import lxml.etree as et
import pandas as pd

### STATIC PART OF XML
root = et.Element('raml', {"version": "2.0", "xmlns": "raml20.xsd"})

cmData = et.SubElement(root, "cmData",
                       {"type":"plan", "scope":"all", "name":"iprt", "id":"PlanConfiguration( 7152069 )"})

header = et.SubElement(cmData, "header")
log = et.SubElement(header, "log",
                    {"dateTime":"2020-06-19T07:38:16.000-03:00", "action":"created", "appInfo":"PlanExporter"})
log.text = "InternalValues are used"

### DYNAMIC PART OF XML
df = pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
                   'dest':['10.104.0.0','10.107.0.0','10.104.0.0','10.102.0.0','0.0.0.0'],
                   'gw':['10.48.0.0','10.45.0.0','10.130.0.0','10.130.0.0','10.110.0.0'],
                   'length':['16','16','8','8','16']})
# SUBSET ITERATION                 
for i, g in df.groupby("MRBTS"):
    managedObject = et.SubElement(cmData, "managedObject", {"distName":"MRBTS-"+i})
    list = et.SubElement(managedObject, "list")
    
    # BUILD DICTIONARY OUT OF EACH ROW
    d = g.drop('MRBTS', axis='columns').to_dict('index')
    
    for ik, iv in d.items():
        item = et.SubElement(list, 'item')
        for k, v in iv.items():
            p = et.SubElement(item, 'p', {"name":k})
            p.text = v

# OUTPUT TREE
tree = et.ElementTree(root)
tree_out = tree.write("Output.xml",
                      xml_declaration=True, 
                      encoding="UTF-8",
                      pretty_print=True,
                      doctype="<!DOCTYPE raml SYSTEM 'raml20.dtd'>")

Output XML

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
  <cmData id="PlanConfiguration( 7152069 )" name="iprt" scope="all" type="plan">
    <header>
      <log action="created" appInfo="PlanExporter" dateTime="2020-06-19T07:38:16.000-03:00">InternalValues are used</log>
    </header>
    <managedObject distName="MRBTS-13004">
      <list>
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="gw">10.48.0.0</p>
          <p name="length">16</p>
        </item>
        <item>
          <p name="dest">10.107.0.0</p>
          <p name="gw">10.45.0.0</p>
          <p name="length">16</p>
        </item>
      </list>
    </managedObject>
    <managedObject distName="MRBTS-13005">
      <list>
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="gw">10.130.0.0</p>
          <p name="length">8</p>
        </item>
        <item>
          <p name="dest">10.102.0.0</p>
          <p name="gw">10.130.0.0</p>
          <p name="length">8</p>
        </item>
        <item>
          <p name="dest">0.0.0.0</p>
          <p name="gw">10.110.0.0</p>
          <p name="length">16</p>
        </item>
      </list>
    </managedObject>
  </cmData>
</raml>
Parfait
  • 104,375
  • 17
  • 94
  • 125