0

How to comment an entire specific block and a particular tag of xml in python ? In below xml, there are many <list> tags. 1) Have to comment entire block <list> {some_data}</list>, where <list name="list_name1"> 2) If you observe <list name="list_name3"> , there are 2 <p> tags in a <item>.

<p name="address1">some/address-3</p><p name="address1_1">some/address-1_1</p>

Here, have to comment second <p> tag, ie.,<p name="address1_1">some/address-1_1</p>, all such instances.

How can we achieve this in python ?
Which is best xml module in python ?

sample_file.xml

    <raml xmlns="abcd.xsd" version="0.1">
    <newData type="hw">
        <header>
          <log action="create" dateTime="2020-01-15T16:45:12.001Z" />
        </header>
        <sampleObject class="com.abcd.efgh:VASDF" distName="some_unique_name" operation="update" version="HDGEKB_8363_845"> 
            <p name="p_name1">true</p>
            <list name="list_name1">
                <item>
                    <p name="address1">some/address-1</p>
                    <p name="value">some/value-1</p>
                </item>
                <item>
                    <p name="address1">some/address-2</p>
                    <p name="value">some/value-2</p>
                </item>
                <item>
                    <p name="address1">some/address-3</p>
                    <p name="value">some/value-3</p>
                </item>
                <item>
                    <p name="address1">some/address-4</p>
                    <p name="value">some/value-4</p>
                </item>
                <item>
                    <p name="address1">some/address-5</p>
                    <p name="value">some/value-5</p>
                </item>
                <item>
                    <p name="address1">some/address-6</p>
                    <p name="value">some/value-6</p>
                </item>
            </list>
            <list name="list_name2">
                <item>
                    <p name="address1">some/address-1</p>
                    <p name="value">1</p>
                </item>
                <item>
                    <p name="address1">some/address-2</p>
                    <p name="value">2</p>
                </item>
                <item>
                    <p name="address1">some/address-3</p>
                    <p name="value">3</p>
                </item>
                <item>
                    <p name="address1">some/address-4</p>
                    <p name="value">4</p>
                </item>
                <item>
                    <p name="address1">some/address-5</p>
                    <p name="value">5</p>
                </item>
                <item>
                    <p name="address1">some/address-6</p>
                    <p name="value">6</p>
                </item>
            </list>
            <list name="list_name3">
                <item>
                    <p name="address1">some/address-1</p>
                    <p name="address1_1">some/address-1_1</p>
                    <p name="value">1</p>
                </item>
                <item>
                    <p name="address1_1">some/address-1_1</p>
                    <p name="value">1_1</p>
                <item>
                <item>
                    <p name="address1">some/address-2</p>
                    <p name="value">2</p>
                </item>
                <item>
                    <p name="address1">some/address-3</p>
                    <p name="address1_1">some/address-1_1</p>
                    <p name="value">3</p>
                </item>
                <item>
                    <p name="address1_1">some/address-1_1</p>
                    <p name="value">3_3</p>
                <item>
                <item>
                    <p name="address1">some/address-4</p>
                    <p name="value">4</p>
                </item>
                <item>
                    <p name="address1">some/address-5</p>
                    <p name="value">5</p>
                </item>
                <item>
                    <p name="address1">some/address-6</p>
                    <p name="value">6</p>
                </item>
             </list>                                                                
        </sampleObject> 
    </newData>

</raml>

output_file.xml , should look like below

<raml xmlns="abcd.xsd" version="0.1">
    <newData type="hw">
        <header>
          <log action="create" dateTime="2020-01-15T16:45:12.001Z" />
        </header>
        <sampleObject class="com.abcd.efgh:VASDF" distName="some_unique_name" operation="update" version="HDGEKB_8363_845"> 
            <p name="p_name1">true</p>
            <!--<list name="list_name1">
                <item>
                    <p name="address1">some/address-1</p>
                    <p name="value">some/value-1</p>
                </item>
                <item>
                    <p name="address1">some/address-2</p>
                    <p name="value">some/value-2</p>
                </item>
                <item>
                    <p name="address1">some/address-3</p>
                    <p name="value">some/value-3</p>
                </item>
                <item>
                    <p name="address1">some/address-4</p>
                    <p name="value">some/value-4</p>
                </item>
                <item>
                    <p name="address1">some/address-5</p>
                    <p name="value">some/value-5</p>
                </item>
                <item>
                    <p name="address1">some/address-6</p>
                    <p name="value">some/value-6</p>
                </item>
            </list> -->
            <list name="list_name2">
                <item>
                    <p name="address1">some/address-1</p>
                    <p name="value">1</p>
                </item>
                <item>
                    <p name="address1">some/address-2</p>
                    <p name="value">2</p>
                </item>
                <item>
                    <p name="address1">some/address-3</p>
                    <p name="value">3</p>
                </item>
                <item>
                    <p name="address1">some/address-4</p>
                    <p name="value">4</p>
                </item>
                <item>
                    <p name="address1">some/address-5</p>
                    <p name="value">5</p>
                </item>
                <item>
                    <p name="address1">some/address-6</p>
                    <p name="value">6</p>
                </item>
            </list>
            <list name="list_name3">
                <item>
                    <p name="address1">some/address-1</p>
                    <!--<p name="address1_1">some/address-1_1</p>-->
                    <p name="value">1</p>
                </item>
                <item>
                    <p name="address1_1">some/address-1_1</p>
                    <p name="value">1_1</p>
                <item>
                <item>
                    <p name="address1">some/address-2</p>
                    <p name="value">2</p>
                </item>
                <item>
                    <p name="address1">some/address-3</p>
                    <!--<p name="address1_1">some/address-1_1</p>-->
                    <p name="value">3</p>
                </item>
                <item>
                    <p name="address1_1">some/address-1_1</p>
                    <p name="value">3_3</p>
                <item>
                <item>
                    <p name="address1">some/address-4</p>
                    <p name="value">4</p>
                </item>
                <item>
                    <p name="address1">some/address-5</p>
                    <p name="value">5</p>
                </item>
                <item>
                    <p name="address1">some/address-6</p>
                    <p name="value">6</p>
                </item>
             </list>                                                                
        </sampleObject> 
    </newData>

</raml>
StackGuru
  • 471
  • 1
  • 9
  • 25
  • Perhaps this helps: https://stackoverflow.com/q/44416111/407651 – mzjn Apr 08 '20 at 05:33
  • 1
    *"Which is best xml module in python?"* If by "best" you mean "has the most features", then my answer would be [lxml](http://lxml.de/index.html). – mzjn Apr 08 '20 at 06:11
  • Thanks @mzjn , How can we comment that entire block ? and block tag containing only address1_1, where both address1 and address1_1 (not where alone address1_1 present. Refer section ). Can you help me ? – StackGuru Apr 08 '20 at 06:25

1 Answers1

2

lxml is able to replace any element with another element, including comment, but unfortunately if you create the text of this comment from an existing element, lxml copies the default namespace again to the comment text.

So instead of lxml I decided to use BeautifulSoup, which treats namespaces more "leniently".

Try the below code:

from bs4 import BeautifulSoup, Comment

soup = BeautifulSoup(open('Input.xml'), 'xml')
for elem in soup.findAll('list'):
    elem.replace_with(Comment(str(elem)))
print(soup.prettify())

For your input XML, shortened a bit, I got:

<?xml version="1.0" encoding="utf-8"?>
<raml version="0.1" xmlns="abcd.xsd">
  <newData type="hw">
    <header>
      <log action="create" dateTime="2020-01-15T16:45:12.001Z"/>
    </header>
    <sampleObject class="com.abcd.efgh:VASDF" distName="some_unique_name" operation="update" version="HDGEKB_8363_845">
      <p name="p_name1">true</p>
<!--<list name="list_name1">
<item>
<p name="address1">some/address-1</p>
<p name="value">1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
</list>-->
<!--<list name="list_name2">
<item>
<p name="address1">some/address-3</p>
<p name="value">3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
</list>-->
    </sampleObject>
  </newData>
</raml>

Edit

If you want to comment out only one list element (e.g. with name attribute set to 'list_name1'), the correction is simple:

findAll has another parameter, namely attrs (a dictionary), where you can pass any attribute names / values to narrow down the selection.

In this case change the loop to:

for elem in soup.findAll('list', attrs={'name': 'list_name1'}):
    elem.replace_with(Comment(str(elem)))

To only delete selected elements, the name of a method to do it is less intuitive, namely it is decompose.

To do it, run:

for elem in soup.findAll('list', attrs={'name': 'list_name1'}):
    elem.decompose()

Edit following the comment about XML prefix

One recipe to remove XML prefix is to call BeautifulSoup without the second xml argument.

But then the root element in the output is html, containing inside body element and raml element is inside it.

So to drop these 2 "outer" elements, change the code to:

soup = BeautifulSoup(open('Input.xml'))
for elem in soup.findAll('list'):
    elem.replace_with(Comment(str(elem)))
print(soup.html.body.raml.prettify())

Also e.g. <p> element is kept in a single line.

A bit "dirty" solution, but hopefully leading to the expected result.

Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • Thank you. How to comment only list-block where name="list_name1" ? – StackGuru Apr 09 '20 at 17:12
  • And also, can we delete the same ? like, delete only where name="list_name1" – StackGuru Apr 09 '20 at 17:13
  • elem.replace_with(Comment(str(elem))) , not commenting in file. Instead, it's printing that entire block. – StackGuru Apr 09 '20 at 20:39
  • elem.decompose() , not deleting that entire block – StackGuru Apr 09 '20 at 20:39
  • Note that *findAll* in the last example contains *attrs={'name': 'list_name1'}* parameter, so only this **one** instance of *list* is deleted. If you drop *attrs* parameter, then **all** *list* elements will be deleted. Try on your own computer. – Valdi_Bo Apr 10 '20 at 05:45
  • @Valdi_Bro : Understood that. Do we need to externally comment in the file ? Because, I tried "elem.replace_with(Comment(str(elem)))", it prints that particular block on console but never commented in the actual file. Could you help me ? – StackGuru Apr 13 '20 at 08:50
  • @Valdi_Bro : have you tried that ? could you help me ? – StackGuru Apr 13 '20 at 17:33
  • neither deleting nor commenting. – StackGuru Apr 13 '20 at 18:09
  • All what my code performs are changes in the XML tree loaded into memory. So far no changes have been made to the input file. I think, the first try should be to save the changed XML tree in **another** file (don't overwrite the original file till you are sure that these changes are OK). Only after that (after you see the changes are correct) you can change your code to overwrite the original file. – Valdi_Bo Apr 14 '20 at 05:11
  • Thanks @Valdi_Bo. It works perfectly. Is it possible to check if it's commented already ? If commented already no need to comment. If not commented, then comment. Possible ? – StackGuru Apr 14 '20 at 09:22
  • I think it is not needed. Comment node is just a comment and whatever text inside it (even looking as an XML tag) will not be parsed as any XML tag. – Valdi_Bo Apr 14 '20 at 09:25
  • , this tag is added at the beginning. Can we remove this ? – StackGuru Apr 14 '20 at 09:27
  • And also, structure of xml changes. Say for example above

    some/address-1

    (all in one line). In output file, this single line printed in three lines. Can we retain original alignment/style/structure ?
    – StackGuru Apr 14 '20 at 09:29
  • writelines(data, xml_declaration=False) takes no keyword arguments. – StackGuru Apr 14 '20 at 09:54
  • @Vladi_Bo : Possible to remove and retain original xml file indentation/style/alignement ? – StackGuru Apr 14 '20 at 09:59
  • could remove that by using "soup.decode_contents()". now to retain original indentation or anything similar pretty-print. – StackGuru Apr 14 '20 at 11:01