0

I need a Python script to read the below text from text file:

"Re-integratieassistent – modelnummer rea 202"

and write it into an XML file.

The issue when writing in XML is using UTF-8 encoding as it writes as:

"Re-integratieassistent  modelnummer rea 202"

"-" is missing between "integratieassistent" and "modelnumber"

How do I solve this?

My current code:

with codecs.open(file,encoding='utf-8', errors='ignore', mode="r") as curr_file:
    for line in curr_file.readlines():

        # Increment the counter because we encountered the XML start or begin elements
        #line = line.encode('utf-8')

        if line.find("<soapenv:Envelope") != -1 or line.find("</soapenv:Envelope") != -1 :
            i=i+1
        if (i == 1):
            file_i = codecs.open(inputFolder_new+"/"+filename,encoding='utf-8', mode="a")
            file_i.writelines(line)
        if (i == 3):
            file_o = codecs.open(outputFolder_new+"/"+filename, encoding='utf-8', mode="a")
            file_o.writelines(line)
        if (i == 4):
            file_i.writelines("</soapenv:Envelope>")
            file_o.writelines("</soapenv:Envelope>")
Celeo
  • 5,583
  • 8
  • 39
  • 41
Testuser
  • 1,107
  • 2
  • 7
  • 15

1 Answers1

0

Python's interfaces for processing XML are grouped in the xml package.

You might want to consider xml.etree.ElementTree for modifying an xml file.

import xml.etree.ElementTree as ET
root = ET.parse(file)

Then use root.findall('{namespace}nodename'). or root.find(). You can loop through each node trying to find your item of interest.

Add a subelement with ET.SubElement(an_element_object, 'your element').

It's hard to be more precise without knowing what your XML file looks like.

airstrike
  • 2,270
  • 1
  • 25
  • 26