3

I'm currently trying to make the input file for a hydrologic model (HBV-light) compatible with external calibration software (PEST). HBV-light requires that it's input files be in XML format, while PEST can only read text files. My issue relates to writing a script that will automatically convert a parameter set written by PEST (in CSV format) to an XML file that can be read by HBV-light.

Here's a short example of a text file that can be written by PEST:

W,X,Y,Z
1,2,3,4

and this is how I'm attempting to organize the XML file:

<Parameters>
   <GroupA>
      <W>1</W>
      <X>2</X>
   </GroupA>
   <GroupB>
      <Y>3</Y>
      <Z>4</Z>
   </GroupB>
</Parameters>

I don't have very much programming experience whatsoever, but here is a python code that I wrote so far:

import csv

csvFile = 'myCSVfile.csv'
xmlFile = 'myXMLfile.xml'

csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="utf-8"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Catchment xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">' + "\n")
xmlData.write('<CatchmentParamters>' + "\n")
rowNum = 0
for row in csvData:
    if rowNum == 0:
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(0, 2):
            tags[i] = tags[i].replace(' ', '_')
    else: 
        for i in range(0, 2):
            xmlData.write('    ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")

    rowNum +=1

xmlData.write('</CatchmentParameters>' + "\n")
xmlData.write('<VegetationZone>' + "\n")
xmlData.write('<VegetationZoneParameters>' + "\n")
rowNum = 0
for row in csvData:
    if rowNum == 0:
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(3, 5):
            tags[i] = tags[i].replace(' ', '_')
    else: 
        for i in range(3, 5):
            xmlData.write('    ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")

    rowNum +=1

xmlData.write('</VegetationZoneParameters>' + "\n")
xmlData.write('</VegetationZone>' + "\n")
xmlData.write('</Catchment>' + "\n")
xmlData.close()

I can get the Group A (or CathmentParameters specifically) to be written, but the Group B section is NOT being written. Not sure what to do!

Geogrammer
  • 137
  • 1
  • 10
  • 1
    As "first time questions" go, this one is very well written! I don't have a quick answer for you - but do take a look at http://stackoverflow.com/questions/3605680/creating-a-simple-xml-file-using-python – Floris Sep 28 '13 at 21:12
  • I did have a slow answer for you... see below. – Floris Sep 28 '13 at 22:25
  • One thing you might consider in future - your code seemed to process the file twice, as though you wanted all the Catchment parameters before any of the Vegetation parameters - and that is how I wrote my answer. It appears (from the accepted answer) that you intended something different. An example with two lines of data would have shown the difference. Remember that people helping on SO cannot read minds - they can only extrapolate from the information you give. – Floris Sep 30 '13 at 00:56

3 Answers3

1

I think the issue is in your range definition in the second part... range(3, 5) means elements 4 and 5, what you want is probably range(2,4) meaning elements 3 and 4.

GL770
  • 2,910
  • 1
  • 14
  • 9
1

I think that the loop is wrong. Try if this works for you

#! /usr/bin/env python
# coding= utf-8

import csv

csvFile = 'myCSVfile.csv'
xmlFile = 'myXMLfile.xml'

csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="utf-8"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Catchment xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">' + "\n")
xmlData.write('<CatchmentParamters>' + "\n")
rowNum = 0


for row in csvData:
    if rowNum == 0:
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(0, 2):
            tags[i] = tags[i].replace(' ', '_')

    else: 
      for i in range(0, 2):
            xmlData.write('    ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")

      xmlData.write('</CatchmentParameters>' + "\n")
      xmlData.write('<VegetationZone>' + "\n")
      xmlData.write('<VegetationZoneParameters>' + "\n")

      for i in range(2, 4):
            xmlData.write('    ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")

      xmlData.write('</VegetationZoneParameters>' + "\n")
      xmlData.write('</VegetationZone>' + "\n")

    rowNum +=1

xmlData.write('</Catchment>' + "\n")
xmlData.close()
Milan Zavišić
  • 301
  • 1
  • 9
  • Thanks so much for playing around with the problem I was having!! This was a great way to solve the issue I was having. Still an extreme novice with programming, so still learning how loops work, and the way that you altered my code has illuminated where the problem was! – Geogrammer Sep 28 '13 at 23:15
  • I presume you found that your first tag `CatchmentParamters` is misspelled. – Floris Sep 30 '13 at 00:51
1

The problem is that you iterate over the contents of the csv file twice - it appears that you need to "rewind" after your first loop. There is also a minor indexing issue, with the second range needing to be range(2,4) and not range(3,5) as was already pointed out.

I created a piece of code that appears to work. It can probably be improved upon by people who understand Python properly. Note - I added a couple of print statements to convince myself I understood what is happening. If you don't open the csvFile a second time (at "starting the second for loop"), then no rows get printed. That's your clue that this is the problem.

import csv

csvFile = 'myCSVfile.csv'
xmlFile = 'myXMLfile.xml'

csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="utf-8"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Catchment xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">' + "\n")
xmlData.write('<CatchmentParamters>' + "\n")
rowNum = 0
for row in csvData:
    print "row is ", row
    if rowNum == 0:
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(0, 2):
            tags[i] = tags[i].replace(' ', '_')
    else: 
        for i in range(0, 2):
            xmlData.write('    ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")

    rowNum +=1

xmlData.write('</CatchmentParameters>' + "\n")
xmlData.write('<VegetationZone>' + "\n")
xmlData.write('<VegetationZoneParameters>' + "\n")
rowNum = 0
print "starting the second for loop"
csvData = csv.reader(open(csvFile))
for row in csvData:
    print "row is now ", row
    if rowNum == 0:
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(2, 4):
            tags[i] = tags[i].replace(' ', '_')
    else: 
        for i in range(2, 4):
            xmlData.write('    ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")

    rowNum +=1

xmlData.write('</VegetationZoneParameters>' + "\n")
xmlData.write('</VegetationZone>' + "\n")
xmlData.write('</Catchment>' + "\n")
xmlData.close()

Using the above with the little test file you had given resulted in the following XML file:

<?xml version="1.0" encoding="utf-8"?>
<Catchment xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<CatchmentParamters>
    <W>1</W>
    <X>2</X>
</CatchmentParameters>
<VegetationZone>
<VegetationZoneParameters>
    <Y>3</Y>
    <Z>4</Z>
</VegetationZoneParameters>
</VegetationZone>
</Catchment>

Problem solved?

Floris
  • 45,857
  • 6
  • 70
  • 122
  • Yes!! Thanks so much for taking the time to play around with it!! It also appears that there were (at least) two easy ways to fix this problem. I'm still getting a handle on loops, so it makes sense that that's where the problem was arising. – Geogrammer Sep 28 '13 at 23:13
  • Glad you got your solution! – Floris Sep 28 '13 at 23:51