1

I have a JSON file where I store a mapping, which contains regexes, like the ones below:

"F(\\d)": "field-\\\\1",
"FLR[ ]*(\\w)": "floor-\\\\1",

To comply with the standard I escape the backslashes, the actually regexps should contain \d, \w, and \\1.

Once I read this JSON with json.load() I still need to post-process the resulting dictionary to get correct regexps. I need to substitute a \\ with \. What's the best way to this?

So far I tried both re.sub() and str.replace() and in both cases it's not clear how to represent a single backslash in substation.

For example, I don't understand why the following doesn't produce a single backslash:

In [76]: "\\\\d".replace("\\\\", "\\")
Out[76]: '\\d'
Nikolay Derkach
  • 1,734
  • 2
  • 22
  • 34
  • 1
    It *does* produce a single backslash. That's just how it's displayed, to make it clear it's a literal backslash not an escape character – jonrsharpe Sep 12 '16 at 17:08

1 Answers1

1

It does produce a single backslash - that backslash is escaped when displayed. This is done so that characters without a non-escaped way to display them can still be unambiguously printed - otherwise, you wouldn't know whether a backslash was meant to be escaping the following character or not.

This can be demonstrated by checking the individual characters:

# In a terminal/REPL:
>>>> "\\\\d".replace("\\\\", "\\")[0]
'\\'
>>>> "\\\\d".replace("\\\\", "\\")[1]
'd'
>>>> "\\\\d".replace("\\\\", "\\")[2]
'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

One tip for doing regexes in python: Use raw strings. If you put an r before the first quote of a string literal, backslashes won't escape anything (except for an ending quote). r"\n" is a string containing two characters, a \ and an n, equivalent to "\\n". When working with regexes and other things where you need to send escape sequences, they're very helpful. See also: What exactly do “u” and “r” string flags do in Python, and what are raw string literals?

Community
  • 1
  • 1
Vivian
  • 1,539
  • 14
  • 38
  • Makes sense, I still have the problem making those regexes work. For example: `In [24]: re.sub("F(\\d)", "field-\\\\1", "F1") Out[24]: 'field-\\1'` – Nikolay Derkach Sep 12 '16 at 17:56
  • @NikolayDerkach That seems to be working exactly as it ought to. That call resolves to "in the string `"F1"`, replace all occurrences of `"F"` followed by a single digit with `"field"` followed by a backslash followed by `"1"`". And that's what it does. The middle argument is escaped once so `re` sees two backslashes, which escapes again to produce one backslash. If you meant to refer to group 1, that would be `"field-\\1". – Vivian Sep 12 '16 at 18:21
  • @NikolayDerkach A tip for doing regexes in python: Use raw strings. If you put an `r` before the first quote of a string literal, backslashes won't escape anything (except for an ending quote). `r"\n"` is a string containing two characters, a backslash and an n, equivalent to `"\\n"`. When working with regexes and other things where you need to send escape sequences, they're very helpful. I'm editing this into the answer. – Vivian Sep 12 '16 at 18:24
  • Makes sense, somehow I thought that group matching regex is `\\1` rather than `\1` – Nikolay Derkach Sep 12 '16 at 21:19