2

I want to remove lines from output that contains in them one of the substrings in my "pattern_list" with python re but keep the output as one string (without those lines).
So , I looked on re library and wrote the following code:

patterns_to_remove = ["$",":",">"]
patterns = "|".join(patterns_to_remove)
extra_lines_with_patterns = re.findall('\r\n.*{} \\w*'.format(re.escape(patterns)), str(output))
for extra_line in extra_lines_with_patterns:
    output = str(output).replace(extra_line, "")
return output

So if my output is :

$a$
:b:
^c^

I want the output to be:

a
b
c

but I get always None in the end , I guess I did something wrong with the re flags.

ms_stud
  • 361
  • 4
  • 18

1 Answers1

1

You escaped a part of a regex pattern with re.escape(patterns) and all | operators turned into literal pipes, \|. Also, you did not group them in the pattern when passing through format and the pattern looked like \r\n.* \$|\:|\> \w*, so it got corrupt (see Why do ^ and $ not work as expected?).

So you need to

  • Escape the patterns_to_remove with "|".join(map(re.escape, patterns_to_remove))
  • Enclose the {} with a (?:...), non-capturing group, i.e. '\r\n.*(?:{}) \\w*'

Use

re.findall('\r\n.*(?:{}) \\w*'.format("|".join(map(re.escape, patterns_to_remove))), str(output))

Or, since you are removing matches, just use re.sub:

patterns_to_remove = ["$",":",">"]
output = re.sub('\r\n.*(?:{}) \\w*'.format("|".join(map(re.escape, patterns_to_remove))), '', str(output))

NOTE: '\r\n.*(?:{}) \\w*' = r'\r\n.*(?:{}) \w*'.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563