1

I am trying to use (?<!\\)# to match #s without a \ in the front (the task is to escape the unescaped #s in a string). This regex works on several online regex validators. However it doesn't work with the python re module. I also tried escaping the symbols in the regex, but it either produces errors or does not produce the expected output.

re.sub("(?<!\\)#","\#",'asd\#fh## #')

How can I modify this regex so it can produce the output asd\\#fh\\#\\# \\#(The output has \s escaped so there are double \\)?

Nick
  • 155
  • 3
  • 1
    Use it like this: `re.sub(r"(?<!\\)#", r"\\#", r'asd\#fh## #')` – anubhava Jul 08 '22 at 17:40
  • @anubhava thanks! didn't know about raw strings. Can you post it as an answer so that I can accept it ? – Nick Jul 08 '22 at 17:42
  • See [Confused about backslashes in regular expressions](https://stackoverflow.com/questions/33582162/confused-about-backslashes-in-regular-expressions). Also, [How python and the regex module handle backslashes](https://stackoverflow.com/a/35797937/3832970). Replacing with backslashes: [Escape special characters in a Python string](https://stackoverflow.com/a/12012114/3832970). – Wiktor Stribiżew Jul 08 '22 at 17:43

1 Answers1

1

You have few issues in your code:

  1. 'asd\#fh## #' is same as 'asd#fh## #' in normal string (unless you use raw string mode)
  2. Likewise "\#" in replacement is same as just #
  3. Similarly "(?<!\\)#" will generate regex syntax error as it will become (?<!\)# without matching ) for negative lookahead

You need to use raw string mode or use double escaping to get it right:

repl = re.sub(r"(?<!\\)#", r"\#", r'asd\#fh## #')
# repl = 'asd\\#fh\\#\\# \\#'
anubhava
  • 761,203
  • 64
  • 569
  • 643