How to modify xml using Beautiful Soup?

Question

I am trying to modify the lookup data elements in an xml file. A snippet of the xml is as follows:

    <?xml version="1.0" encoding="UTF-8"?>
<Configuration>
    <Options>
        <SampleRate>1000</SampleRate>
        <MaxStateSize>1</MaxStateSize>
        <MaxOutputSize>1</MaxOutputSize>
    </Options>

    <CustomDefinitions>
        <MyRser class="OhmicResistance">
            <Object class="LookupObj2dWithState">
                <RowState cacheref="Soc"/>
                <ColState cacheref="ThermalState"/>
                <LookupData>
                    0.02597518381655694900, 0.02513715386193249600, 0.02394715132636577100, 0.02325996676357371800, 0.02317075771456176400, 0.02277814077034603900, 0.02267913709322775700, 0.02258569292134297900, 0.02235026503875497600, 0.02222478423822949300, 0.02207606555239715500, 0.02198493491067361700, 0.02188144525929673300, 0.02167985791309091600, 0.02145797158835977700, 0.02137484908165417400, 0.02126561803424023600, 0.02124462299304301700, 0.02123310358079429400, 0.02126287857906075300, 0.02094998489960795500, 0.02073326148328196600, 0.02062489977511897100, 0.02038933084432985300;
                </LookupData>
                <MeasurementPointsRow desc="StateOfCharge">
                -5, 0, 7.100000e+00, 1.120000e+01, 16, 2.080000e+01, 2.560000e+01, 3.040000e+01, 3.520000e+01, 4.010000e+01, 4.490000e+01, 4.970000e+01, 5.450000e+01, 5.930000e+01, 6.420000e+01, 69, 7.380000e+01, 7.860000e+01, 8.350000e+01, 8.830000e+01, 9.310000e+01, 9.770000e+01, 100, 105
                </MeasurementPointsRow>
                <MeasurementPointsColumn desc="ThermalState">
                25
                </MeasurementPointsColumn>
            </Object>
        </MyRser>

I want to modify the lookup data and save a copy of the xml with that modification. This is how I do it:

with open('....xml') as fp:
        contents = fp.read()
        soup = BeautifulSoup(contents, 'lxml')

        tag = soup.find(elem_name).find(elem_path).lookupdata
        tag.replace_with(str(values))

    #saves the modified data as a new xml version
    teslaname= elem_name+key
    
    with open('modified.xml', 'w') as file:  
        file.write(str(soup))
        file.close()

But , when I do this, the specific modification is done but it changes the xml structure.

 <?xml version="1.0" encoding="UTF-8"?><html><body><configuration>
<options>
<samplerate>1000</samplerate>
<maxstatesize>1</maxstatesize>
<maxoutputsize>1</maxoutputsize>
</options>
<customdefinitions>
<myrser class="OhmicResistance">
<object class="LookupObj2dWithState">
<rowstate cacheref="Soc"></rowstate>
<colstate cacheref="ThermalState"></colstate>
0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339
<measurementpointsrow desc="StateOfCharge">
                -5, 0, 7.100000e+00, 1.120000e+01, 16, 2.080000e+01, 2.560000e+01, 3.040000e+01, 3.520000e+01, 4.010000e+01, 4.490000e+01, 4.970000e+01, 5.450000e+01, 5.930000e+01, 6.420000e+01, 69, 7.380000e+01, 7.860000e+01, 8.350000e+01, 8.830000e+01, 9.310000e+01, 9.770000e+01, 100, 105
                </measurementpointsrow>
<measurementpointscolumn desc="ThermalState">
                25
                </measurementpointscolumn>
</object>
</myrser>

And I want to preserve the structure and only modify the data. I know it can be done through ElementTree, but how I need my code to function, beautifulsoup is more simpler to use. So if considering only to use beautifulsoup, how can I edit and save a copy of xml without losing the original structure of xml? Any help would be appreciated.

score 1 · Accepted Answer · answered Nov 10 '20 at 17:33

Using lxml, you can do something along the lines of:

from lxml import etree
config = """[your xml above, corrected - it's not well formed]"""
new_values = "1,2,3,4"
doc = etree.XML(config.encode())
target = doc.xpath('//LookupData')[0]
target.text = new_values
print(etree.tostring(doc).decode())

Output:

<Configuration>
    <Options>
        <SampleRate>1000</SampleRate>
        <MaxStateSize>1</MaxStateSize>
        <MaxOutputSize>1</MaxOutputSize>
    </Options>

    <CustomDefinitions>
        <MyRser class="OhmicResistance">
            <Object class="LookupObj2dWithState">
                <RowState cacheref="Soc"/>
                <ColState cacheref="ThermalState"/>
                <LookupData>1,2,3,4</LookupData>
                <MeasurementPointsRow desc="StateOfCharge">
                -5, 0, 7.100000e+00, 1.120000e+01, 16, 2.080000e+01, 2.560000e+01, 3.040000e+01, 3.520000e+01, 4.010000e+01, 4.490000e+01, 4.970000e+01, 5.450000e+01, 5.930000e+01, 6.420000e+01, 69, 7.380000e+01, 7.860000e+01, 8.350000e+01, 8.830000e+01, 9.310000e+01, 9.770000e+01, 100, 105
                </MeasurementPointsRow>
                <MeasurementPointsColumn desc="ThermalState">
                25
                </MeasurementPointsColumn>
            </Object>
        </MyRser>
        </CustomDefinitions>
    </Configuration>

what does the index value [0] signify in doc.xpath......secondly, for further elements in the xml ...if I want to automate the xpath with something like : soc = doc.xpath('//CustomDefinitions//'+elem1+'[@class="'+elem2+'"]//MeasurementPointsRow[@ desc="StateOfCharge"]/text()')... it gives me error at element2 as it is unable to read..how can I fix that... here elem1 and elem2 are lists through which I want to loop over — user14447985, Nov 10 '20 at 21:02
As to the first part - `doc.xpath('//LookupData')` returns a `list` (of `len()=1`, in this case, though it doesn't affect the outcome here); you have to use the `[0]` index to access the target node in that list so you can modify its text attribute value. As to the second part - I don't really understand it and, in any case, you should probably post it as a separate question with all necessary details. — Jack Fleeting, Nov 10 '20 at 21:28

score 0 · Answer 2 · answered Nov 10 '20 at 15:42

0

To keep the XML structure, use the .prettify() method when writing to the file:

file.write(str(soup.prettify()))

Note:

BeautifulSoup converts the XML tags to lowercase.
Since your opening the file using a context manager, there's no need to close the file using file.close(), the file will automatically close when exiting the indentation block.

answered Nov 10 '20 at 15:42

MendelG

14,885
4
25
52

thank you for the response. The issue is modification through beautifulsoap addes an html tag to the xml, and that is why , it is giving me an error in the further code where I try to use and run the modified xml – user14447985 Nov 10 '20 at 16:16
about the first point you mentioned..is there a way we can not use the all tags to lowercase feature? – user14447985 Nov 10 '20 at 16:17
@user14447985 I’m not sure if you can change that BS shouldn’t convert the tags to lowercase. See if changing your parser to `html.parser` instead of `lxml` avoids adding extra tags, or see if [this](https://stackoverflow.com/a/36144530/12349734) solves the problem. – MendelG Nov 10 '20 at 16:34

How to modify xml using Beautiful Soup?

2 Answers2