0

I am trying to remove tags in text that are identified by a backslash. For example, for the phrase 'Hello \tag world', I'd like to return the phrase 'Hello world'. I've tried the following but it doesn't get rid of the '\tag'.

print re.sub('\\[A-Za-z]+',' ',text)

I'm sure it's something simple, but I can't seem to figure it out.

Thanks for any help you can give!

myname
  • 1,337
  • 2
  • 11
  • 17
  • 1
    Use raw strings for regexes. ``\\`` puts a literal backslash in your regex, but a literal backslash in your regex doesn't match a literal backslash in `text` - it's treated as an escape character by the regex engine. You need the extra layer of escaping that raw strings provide. – user2357112 Feb 20 '17 at 20:06

1 Answers1

2

Must be:

re.sub('\\\\[A-Za-z]+',' ',text)

Otherwise, '\\' is treated as a regex special escape character.

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • 1
    While that is a valid option, raw string notation is usually more convenient. – user2357112 Feb 20 '17 at 20:08
  • 1
    @user2357112 that's entirely a matter of opinion. This answer is perfectly valid and [this](http://stackoverflow.com/questions/33582162/backslashes-in-python-regular-expressions) would have been a better duplicate question. – miken32 Feb 20 '17 at 20:12