1

I have test data that looks like this: [['81303~E8889~81300~7295~71942', 'AR', '61.43463381', '0', '0', '0', '0', '0', '0', '0', '99212~73080~00378415101~00406035705~63304045830~99202~WC101~29105~A4565~73070~73090~99203'], ['7234', 'AR', '54.29158111', '0', '0', '0', '1', '0', '0', '0', 'E0849']] and I'm trying to get this into XML format for a soap API call, but the issue I'm running into is the tilde Here is what my test csv file looks like:

playcodes,Benefit State,Age at 

DOL,BEHAVIORAL,CARDIAC,DIABETES,HYPERTENSION,OBESE,SMOKER,SUBSTANCE,CPT_codes
81303~E8889~81300~7295~71942,AR,61.43463381,0,0,0,0,0,0,0,99212~73080~00378415101~00406035705~63304045830~99202~WC101~29105~A4565~73070~73090~99203
7234,AR,54.29158111,0,0,0,1,0,0,0,E0849

My code is this:

import csv
import xml.etree.ElementTree as ET
import xml

f = open('medical_test.csv')
next(f)
csv_f = csv.reader(f)   
data = []
for row in csv_f: 
   data.append(row)
f.close()

print(data)
[['81303~E8889~81300~7295~71942', 'AR', '61.43463381', '0', '0', '0', '0', '0', '0', '0', '99212~73080~00378415101~00406035705~63304045830~99202~WC101~29105~A4565~73070~73090~99203'], ['7234', 'AR', '54.29158111', '0', '0', '0', '1', '0', '0', '0', 'E0849']]


def convert_row(row):
    return """
<?xml version="1.0" encoding="UTF-8"?>
<cbcalc>
    <icdcodes>
        %s
    </icdcodes>
    <state>%s</state>
    <country>US</country>
    <clientid>Custom field</clientid>
    <medicalonly></medicalonly>
    <bpcode></bpcode>
    <noicode></noicode>
    <age>%s</age>
    <jobclass>1</jobclass>
    <fulloutput>Y</fulloutput>
    <cfactors>
        <depression>%s</depression>
        <cardiac>%s</cardiac>
        <diabetes>%s</diabetes>
        <hypertension>%s</hypertension>
        <legalrep></legalrep>
        <obesity>%s</obesity>
        <smoker>%s</smoker>
        <subabuse>%s</subabuse>
    </cfactors>
    <prosummary>
        <icd>
            <code></code>
        </icd>
        <ndc>
            <code></code>
        <ndc>
        <cpt>
            <code>%s</code>
        </cpt>
        <hcpcs>
            <code></code>
        </hcpcs>
    </prosummary>
</cbcalc>
""" % (row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[8], row[9], row[10])

print('\n'.join([convert_row(row) for row in data[1:]]))

Output looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<cbcalc>
    <icdcodes>
        <code>7234</code>
    </icdcodes>
    <state>AR</state>
    <country>US</country>
    <clientid>Custom field</clientid>
    <medicalonly></medicalonly>
    <bpcode></bpcode>
    <noicode></noicode>
    <age>54.29158111</age>
    <jobclass>1</jobclass>
    <fulloutput>Y</fulloutput>
    <cfactors>
        <depression>0</depression>
        <cardiac>0</cardiac>
        <diabetes>0</diabetes>
        <hypertension>1</hypertension>
        <legalrep></legalrep>
        <obesity>0</obesity>
        <smoker>0</smoker>
        <subabuse>0</subabuse>
    </cfactors>
    <prosummary>
        <icd>
            <code></code>
        </icd>
        <ndc>
            <code></code>
        <ndc>
        <cpt>
            <code>E0849</code>
        </cpt>
        <hcpcs>
            <code></code>
        </hcpcs>
    </prosummary>
</cbcalc>

I need this:

<?xml version="1.0" encoding="UTF-8"?>
<cbcalc>
    <icdcodes>
       <code>81303</code>
       <code>E8889</code>
       <code>81300</code>
       <code>7295</code>
       <code>71942</code>
    </icdcodes>
    <state>AR</state>
    <country>US</country>
    <age>61.43463381</age>
    <jobclass>1</jobclass>
    <fulloutput>Y</fulloutput>
    <cfactors>
       <depression>0</depression>
       <cardiac>0</cardiac>
       <diabetes>0</diabetes>
       <hypertension>0</hypertension>
       <obesity>0</obesity>
       <smoker>0</smoker>
       <subabuse>0</subabuse>
    </cfactors>
    <prosummary>
       <icd>
       </icd>
       <ndc>
       </ndc>
       <cpt>
               <code>99212</code>
               <code>73080</code>
               <code>00378415101</code>
               <code>00406035705</code>
               <code>63304045830</code>
               <code>99202</code>
               <code>WC101</code>
               <code>29105</code>
               <code>A4565</code>
               <code>73070</code>
               <code>73090</code>
               <code>99203</code>
       </cpt>
       <hcpcs>
       </hcpcs>
    </prosummary>
    <icdcodes>
       <code>7234</code>
    </icdcodes>
    <state>AR</state>
    <country>US</country>
    <age>54.29158111</age>
    <jobclass>1</jobclass>
    <fulloutput>Y</fulloutput>
    <cfactors>
       <depression>0</depression>
       <cardiac>0</cardiac>
       <diabetes>0</diabetes>
       <hypertension>1</hypertension>
       <obesity>0</obesity>
       <smoker>0</smoker>
       <subabuse>0</subabuse>
    </cfactors>
    <prosummary>
       <icd>
       </icd>
       <ndc>
       </ndc>
       <cpt>
               <code>E0849</code>
       </cpt>
       <hcpcs>
       </hcpcs>
    </prosummary>
</cbcalc>

I tried this but it didn't work.

codes = '\n'.join([f'       <code>{item}</code>'
 for item in row[0].split('~')])

I'm losing data because I have the tildes in the list. How do I extract the data that has a tilde so that I get all my data in XML format?

  • You should not build XML with string formatting but DOM methods in `xml.etree.ElementTree` library. – Parfait Dec 29 '20 at 01:05
  • @Parfait I've updated my code with a test csv file I've been using. – UndefinedKid01 Dec 29 '20 at 12:12
  • @Parfait looks like it didn't copy in correctly. `Age at DOL` is all one header. And how do I create a new tree after each row? So that each row would start with ``and end with `` – UndefinedKid01 Dec 29 '20 at 16:31

2 Answers2

0

Consider DOM methods with xml.etree.ElementTree iterating through csv.DictReader to reference data by dictionary keys.

Assuming actual CSV data includes:

playcodes,Benefit State,Age at DOL,BEHAVIORAL,CARDIAC,DIABETES,HYPERTENSION,OBESE,SMOKER,SUBSTANCE,CPT_codes
81303~E8889~81300~7295~71942,AR,61.43463381,0,0,0,0,0,0,0,99212~73080~00378415101~00406035705~63304045830~99202~WC101~29105~A4565~73070~73090~99203
7234,AR,54.29158111,0,0,0,1,0,0,0,E0849

See code adjustment:

import csv
import xml.etree.ElementTree as ET
import xml.dom.minidom                 # FOR PRETTY PRINT

# INITIALIZING XML FILE
root = ET.Element('root') 

# READING CSV FILE
with open("medical_test.csv") as f: 
   reader = csv.DictReader(f) 
      
   # WRITING TO XML NODES 
   for i, row in enumerate(reader, start=1):
       cbNode = ET.SubElement(root, "cbalc")
       icdNode = ET.SubElement(cbNode, "icdcodes") 

       for code in row['playcodes'].split('~'):
           ET.SubElement(icdNode, "code").text = code

       ET.SubElement(cbNode, "state").text = row['Benefit State']
       ET.SubElement(cbNode, "country").text = "US"  
       ET.SubElement(cbNode, "age").text = row['Age at DOL']
       ET.SubElement(cbNode, "jobclass").text = "1" 
       ET.SubElement(cbNode, "fulloutput").text ="Y"

       cfNode = ET.SubElement(cbNode, "cfactors")
       for k in ['BEHAVIORAL', 'CARDIAC', 'DIABETES',
                 'HYPERTENSION', 'OBESE', 'SMOKER', 'SUBSTANCE']:
           ET.SubElement(cfNode, k.lower()).text = str(row[k])

       psNode = ET.SubElement(cbNode, "prosummary")
       ET.SubElement(psNode, "icd")
       ET.SubElement(psNode, "ndc") 

       cptNode = ET.SubElement(psNode, "cpt")
       for code in row['CPT_codes'].split('~'):
           ET.SubElement(cptNode, "code").text = code

       ET.SubElement(psNode, "hcpcs")

# SAVING XML FILE 
doc = ET.tostring(root, method='xml', encoding="UTF-8")
with open(f'Output.xml', 'wb') as f: 
      f.write(doc)

# PRETTY PRINT VERSION   
dom = xml.dom.minidom.parseString(doc)
with open(f'Output.xml', 'wb') as f: 
  f.write(dom.toprettyxml(encoding="UTF-8"))                

Output

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <cbalc>
        <icdcodes>
            <code>81303</code>
            <code>E8889</code>
            <code>81300</code>
            <code>7295</code>
            <code>71942</code>
        </icdcodes>
        <state>AR</state>
        <country>US</country>
        <age>61.43463381</age>
        <jobclass>1</jobclass>
        <fulloutput>Y</fulloutput>
        <cfactors>
            <behavioral>0</behavioral>
            <cardiac>0</cardiac>
            <diabetes>0</diabetes>
            <hypertension>0</hypertension>
            <obese>0</obese>
            <smoker>0</smoker>
            <substance>0</substance>
        </cfactors>
        <prosummary>
            <icd/>
            <ndc/>
            <cpt>
                <code>99212</code>
                <code>73080</code>
                <code>00378415101</code>
                <code>00406035705</code>
                <code>63304045830</code>
                <code>99202</code>
                <code>WC101</code>
                <code>29105</code>
                <code>A4565</code>
                <code>73070</code>
                <code>73090</code>
                <code>99203</code>
            </cpt>
            <hcpcs/>
        </prosummary>
    </cbalc>
    <cbalc>
        <icdcodes>
            <code>7234</code>
        </icdcodes>
        <state>AR</state>
        <country>US</country>
        <age>54.29158111</age>
        <jobclass>1</jobclass>
        <fulloutput>Y</fulloutput>
        <cfactors>
            <behavioral>0</behavioral>
            <cardiac>0</cardiac>
            <diabetes>0</diabetes>
            <hypertension>1</hypertension>
            <obese>0</obese>
            <smoker>0</smoker>
            <substance>0</substance>
        </cfactors>
        <prosummary>
            <icd/>
            <ndc/>
            <cpt>
                <code>E0849</code>
            </cpt>
            <hcpcs/>
        </prosummary>
    </cbalc>
</root>
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • how do I create a new tree after each row? So that each row would start with ``and end with ``. I believe that's what's needed for an individual soap call, correct? – UndefinedKid01 Dec 29 '20 at 16:52
  • Per comment above, for each new XML document with `` as root, integrate DOM build and write inside a `enumerate` loop on `csv.DictReader` object. – Parfait Dec 29 '20 at 16:52
  • does it have to be XML document each time? I thought it could be a single document. – UndefinedKid01 Dec 29 '20 at 16:54
  • I'm a little confused. DO you mind showing me how to do it? I'm new tow having to write csv to xml – UndefinedKid01 Dec 29 '20 at 16:58
  • By W3C rules, XML documents can only have **one** root. Therefore, to include two `` in same document, this element must be a child of a *different* root. Any compliant SOAP tool should conform to XML markup rules. See edit of two `` in same document with a different root. – Parfait Dec 29 '20 at 17:14
  • Ok that makes sense. Do you mind showing me how to enumerate the loop for `csv.DictReader` – UndefinedKid01 Dec 29 '20 at 18:21
  • Specifically, please see last [versions](https://stackoverflow.com/posts/65486255/revisions) of this solution. Generally, you can use `enumerate` on any iterable in Python (list, tuple, dict, generator, etc.). Read docs, tutorials, etc. on the method. – Parfait Dec 29 '20 at 19:40
-1

There a couple of ways to address it, but I would prefer using string manipulation first (effectively, a template), followed by an import to xml (I use lxml below, rather than etree.ElementTree, because I prefer its xpath support, but you can do it either way.

from lxml import etree

#start of xml string:
xml_string= """<?xml version="1.0" encoding="UTF-8"?>
 <root>"""

for datum in data:
    ipcodes_str =''
    icdcodes = datum[0].split('~')
    for icd in icdcodes:
        ipcodes_str+=(f'<code>{icd}</code>\n')
    cpts = datum[-1].split('~')
    cpts_str =''
    for cpt in cpts:
        cpts_str+=(f'<code>{cpt}</code>\n')

    #now for the body of the xml string:

    xml_string+=(f"""<cbcalc><icdcodes>
       {ipcodes_str}       
    </icdcodes>
    <state>{datum[1]}</state>
    <country>US</country>
    <age>{datum[2]}</age>
    <jobclass>1</jobclass>
    <fulloutput>Y</fulloutput>
    <cfactors>
       <depression>{datum[3]}</depression>
       <cardiac>{datum[4]}</cardiac>
       <diabetes>{datum[5]}</diabetes>
       <hypertension>{datum[6]}</hypertension>
       <obesity>{datum[7]}</obesity>
       <smoker>{datum[8]}</smoker>
       <subabuse>{datum[9]}</subabuse>
    </cfactors>
    <prosummary>
       <icd>
       </icd>
       <ndc>
       </ndc>
       <cpt>
        {cpts_str}
       </cpt>
       <hcpcs>
       </hcpcs>
    </prosummary></cbcalc>""")

#close the xml string
xml_string+="""</root>"""

#parse the xml string
doc = etree.XML(xml_string.encode())
print(etree.tostring(doc).decode())

The output should be a well formed xml file as in your question.

Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • how do I separate the trees? I overlooked that. The code does exactly what I want it to do, but I messed up and forgot all about separating out the trees. How do I add `` and `` at the end of a row so it's closing after each row. – UndefinedKid01 Dec 29 '20 at 15:30
  • Building XML documents by string concatenation and formatting is not advisable. *See*: [What's so bad about building XML with string concatenation?](https://stackoverflow.com/questions/3034611/whats-so-bad-about-building-xml-with-string-concatenation) – Parfait Dec 29 '20 at 16:07
  • @UndefinedKid01 What do you mean by "separating the trees"? – Jack Fleeting Dec 29 '20 at 17:10
  • so there are two rows of data so both of those rows would be in the api call or in two separate api calls. I thought when making the call, it can to be something like ` data data ` In my file that I will be sending has over 50k rows – UndefinedKid01 Dec 29 '20 at 17:34
  • @UndefinedKid01 If you are willing to use string concatenation - and if I understand your comment correctly - it can be done fairly easily. See edit. – Jack Fleeting Dec 30 '20 at 13:07
  • Hi, yeah that was one of the ways I was looking for! – UndefinedKid01 Dec 30 '20 at 16:22