0

I have raw HTML and am trying to remove this whole block like this [%~ as..abcd ~%] from the output string. Using re library of python

teststring = "Check the direction . [%~ MACRO wdwDate(date) BLOCK;
                 SET tmpdate = date.clone();
                 END ~%] Determine if both directions."
cleanM = re.compile('\[\%\~ .*? \~\%\]')
scleantext = re.sub(cleanM,'', teststring)

what is wrong in the code ?

pushkar rawat
  • 37
  • 1
  • 6

2 Answers2

1

Your pattern should be

cleanM = re.compile(r'\[\%\~ .*? \~\%\]',re.S)

. matches any character except new line, S allows to match the newline

mkHun
  • 5,891
  • 8
  • 38
  • 85
  • The caveat is that you need to use `re.compile` when you want to use re.S. It does not work directly in re.sub for whatever reason... – mrCarnivore Dec 01 '17 at 11:29
  • You can also exclude the markers from the match: `r'(?<=\[%~ ).*(?= \~%])'`. BTW: Always use raw strings (`r'...'`) on regular expressions. – Klaus D. Dec 01 '17 at 11:31
0

You need to use [\S\s]* instead of .* and you can leave out compile:

import re
teststring = '''Check the direction . [%~ MACRO wdwDate(date) BLOCK;
                 SET tmpdate = date.clone();
                 END ~%] Determine if both directions.'''
scleantext = re.sub('(\[%~ [\S\s]* ~%\])', '', teststring)

print(scleantext)
mrCarnivore
  • 4,638
  • 2
  • 12
  • 29