I am new to python and trying to modify some xml configuration files which are present in my local system.
Input: I have an xml file(say Test.xml) with the following content.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<JavaHost xmlns="SomeInfo/v1.1">
<Domain>
<MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<SocketTimeout>500</SocketTimeout>
</MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<Composer>
<SocketTimeout>5000</SocketTimeout>
<Enabled>true</Enabled>
</Composer>
</Domain>
</JavaHost>
WHAT I WANT TO ACHIEVE: I want to achieve below 2 things:
Part 1: I want to modify value of SocketTimeout tag(only under composer tag) to 60 and also want to add a comment like this (foe e.g. Changed this value to reduce SocketTimeout). Hence the file Test.xml should be as below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<JavaHost xmlns="SomeInfo/v1.1">
<MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<SocketTimeout>500</SocketTimeout>
</MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<Composer>
<!-- Changed this value to reduce SocketTimeout -->
<SocketTimeout>60</SocketTimeout>
<Enabled>true</Enabled>
</Composer>
</Domain>
</JavaHost>
Part 2: In the file Test.xml, I want to add a new tag under Domain tag as below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<JavaHost xmlns="SomeInfo/v1.1">
<MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<SocketTimeout>500</SocketTimeout>
</MessageProcessor>
<!-- comment should not be removed and all formatting should be untouched -->
<Composer>
<!-- Changed this value to reduce SocketTimeout -->
<SocketTimeout>60</SocketTimeout>
<Enabled>true</Enabled>
</Composer>
<New_tag>
<!-- New Tag -->
<Enabled>true</Enabled>
</New_tag>
</Domain>
</JavaHost>
That’s all I want :)
WHAT I HAVE TRIED:
To achieve this task I considered below optons:
Minidom/ElementTree/lxml removes comments in the file and also changes the formatting of the file.
Regex: Doesn’t removes comments, also doesn’t disturb formatting. Hence, I opted for regex and below is what I started with, but is not working :(
import os, re
# set the working directory
os.chdir('C:\\Users\\Dell\\Desktop\\')
# open the source file and read it
fh = open('C:\\Users\\Dell\\Desktop\\Test.xml', 'r')
subject = fh.read()
fh.close()
pattern = re.compile(r"\[<Composer>\].*?\[/<Composer>\]")
#Replace
result = pattern.sub(lambda match: match.group(0).replace('<SocketTimeout>500</SocketTimeout>','<SocketTimeout>60</SocketTimeout>') ,subject)
# write the file
f_out = open('C:\\Users\\Dell\\Desktop\\Test.xml', 'w')
f_out.write(result)
f_out.close()
Any idea in implementing what I want or rectification in mistakes would be highly appreciable. Although I am new to python but will try my best to work on the suggestions.