1

A file has lines like:

$ cat build-log.txt
2018-02-23T10:08:52.856946Z|cov-internal-capture|6812|info|> EXECUTING: "C:\ghs\comp_201416\asarm.exe" -elf -b0 [ skip ]

I spent an hour or more trying to find out how to set the Python re.search pattern. It all ended up with (the goal was just to extract the compiler name, 'asarm' in this instance)

m = re.search('C:\\\\ghs\\\\comp_201416\\\\([a-z]*)\\.exe', line)

in the Python code which works (at least m.group(1) gives 'asarm' )

Please explain why the four backslashes is needed to quote just one in the file?

Using

Python 2.7
Mac OS X High Sierra
Elpy
GNU Emacs 25.3.1 (x86_64-apple-darwin13.4.0, NS appkit-1265.21 Version 10.9.5 (Build 13F1911))
[ https://emacsformacosx.com ]
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • 2
    Because you're escaping them twice; once in the Python string literal syntax, once in the regular expression. Maybe read [the relevant docs](https://docs.python.org/2/library/re.html), this is explained in the opening paragraphs. – jonrsharpe Feb 25 '18 at 09:53
  • 2
    Possible duplicate of [Backslashes in Python Regular Expressions](https://stackoverflow.com/questions/33582162/backslashes-in-python-regular-expressions) – jonrsharpe Feb 25 '18 at 09:57

1 Answers1

2

You should only need to escape something once if you specify it to be a raw r string.

regex = r"C:\\ghs\\comp_201416\\([a-z]*)\.exe"

\ is escaped once, so it looks like \\, for .exe only . needs escaping, so \.

l'L'l
  • 44,951
  • 10
  • 95
  • 146
  • @usr2564301: Why wouldn't it work, did you try it? looks fine to me: https://repl.it/repls/LumberingBluevioletSoftwaresuite – l'L'l Feb 25 '18 at 10:06
  • 1
    (After trying) Ow! Caught by the meaning of `r`... It removes the need for doubling in *Python* strings but of course they keep their intended GREP meaning ... – Jongware Feb 25 '18 at 10:17
  • 1
    The Regexp language within the Python string 'language'. Complicated. '\' is special in both thus if I want it pass through unchanged I have to double quote it. '.' on the other hand is special only for Regexp thus is has to be quoted only one. I understand that r"" turns off all speciality for the Python strings and only quoting for Regexp remains. – Vladimir Zolotykh Feb 25 '18 at 18:44