I want to remove all lines that contain all words in the 'xml_lines' list. I created this script:
from pathlib import Path
# Provide relative or absolute file path to your xml file
filename = './.content.xml'
path = Path(filename)
conntents = path.read_text()
xml_lines = [
'first',
'second',
]
lines = conntents.splitlines()
removed_lines = 0
for line in lines:
for xml_line in xml_lines:
if xml_line in line:
lines.remove(line)
removed_lines += 1
print(f'Line: "{line.strip()}" has been removed!')
print(f"\n\n{removed_lines} lines have been removded!")
path.write_text(str(lines))
At the and I have a file that does not look like xml. Can anyone help?
Example (before):
<?xml version="1.0"?>
<data>
<country
name="Liechtenstein"
first="2d2md"
second="m3d39d93">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<tiger
name="Singapore"
first="hfdfherbre"
second="m3d39d93">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</tiger>
<car
name="Panama"
first="th54b4"
second="45b45gt45h">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</car>
</data>
if script finds any line that contain 'first' or 'second', the entire line should be removed:
<?xml version="1.0"?>
<data>
<country
name="Liechtenstein"
>
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<tiger
name="Singapore"
>
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</tiger>
<car
name="Panama">
>
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</car>
</data>
This is only an example, entire xml file consists of 9999999 lines...