1

I have an xml file that contains this:

<supported-languages>
 <lang><![CDATA[en_US]]></lang>
 <lang><![CDATA[es_ES]]></lang>
 <lang><![CDATA[de_DE]]></lang>
</supported-languages>

<2ndsupported-languages>
 <lang><![CDATA[en_US]]></lang>
 <lang><![CDATA[es_ES]]></lang>
 <lang><![CDATA[de_DE]]></lang>
</2ndsupported-languages>

I only want to delete any line that contains de_DE, and save the file.

So far I have this:

import fileinput
import sys

file = "C:\\Users\Desktop\file.xml"
searchExp = "de_DE"
replaceExp = ""


def replaceAll(file,searchExp,replaceExp):
    for line in fileinput.input(file, inplace=1):
        line = line.replace(searchExp,replaceExp)
        sys.stdout.write(line)

replaceAll(file,searchExp,replaceExp)

Close, but not really. It will will search for "de_DE", but will only replace that with <null>. This is the result:

<supported-languages>
 <lang><![CDATA[en_US]]></lang>
 <lang><![CDATA[es_ES]]></lang>
 <lang><![CDATA[]]></lang>
</supported-languages>

<2ndsupported-languages>
 <lang><![CDATA[en_US]]></lang>
 <lang><![CDATA[es_ES]]></lang>
 <lang><![CDATA[]]></lang>
</2ndsupported-languages>

I want my results to look like this

<supported-languages>
 <lang><![CDATA[en_US]]></lang>
 <lang><![CDATA[es_ES]]></lang>
</supported-languages>

<2ndsupported-languages>
 <lang><![CDATA[en_US]]></lang>
 <lang><![CDATA[es_ES]]></lang>
</2ndsupported-languages>

How do I do this?

I tried to import re and then replace pattern with pattern = "^.*de_DE.*$" but that did not work.

martineau
  • 119,623
  • 25
  • 170
  • 301
  • 1
    If you need to remove a line containing a literal substring you don't need a regex. `for line in fileinput.input(file, inplace=1):` then `if 'de_DE' not in line:` then write it to the other file. – Wiktor Stribiżew Aug 28 '19 at 21:25
  • Manipulating XML with regex is very often a bad idea. Python has built-in XML libraries; use them. – tripleee Aug 29 '19 at 08:02

2 Answers2

0

Write only the lines that doesn't contain the substring searchExp

def replaceAll(file, searchExp):
    for line in fileinput.input(file, inplace=1):
        if searchExp not in line:
            sys.stdout.write(line)
Prem Anand
  • 2,469
  • 16
  • 16
-1

Don't waste your time with line by line file reading.
Read the whole file at once into a string.
Work on the string using the regex below, then reset the
file handle and write the string to the file.

Advantages:
- You have gigabytes of ram
- You can alter the regex to meet any search changes in the future
that require spanning lines for instance \[\s*de_DE\s*\], without
needing to modify any language code.
- If you decide to parse the xml tags without possibly splitting up tag content
based on anything on the line.
(I can give you a regex to accomplish this if you need it)


Do a re.sub() using this

.*<!\[CDATA\[de_DE\]\]>.*(?:\r?\n)?

https://regex101.com/r/xy0AHj/1