4

I'm working on a part of a project, which is repleacing http url's with https url's if possible.

The Problem is, that the regular expressions for that are written for the javascript regex parser, but I'm using that regex inside python. To be compatible, I would rewrite the regex during parsing into a valide python regex.

as example, I have that regular expression given:

https://$1wikimediafoundation.org/

and I would a regular expression like that:

https://\1wikimediafoundation.org/

my problem is that I doesn't know how to do that (converting $ into \)


This code doesn't work:

'https://$1wikimediafoundation.org/'.replace('$', '\')

generate the following error:

SyntaxError: EOL while scanning string literal

This code work without error:

'https://$1wikimediafoundation.org/'.replace('$', '\\')

but generate a wrong output:

'https://\\1wikimediafoundation.org/'
pointhi
  • 303
  • 1
  • 5
  • 13
  • 1
    Your substitution is correct, you're probably being confused by the way you display the result. Print it out with `print` and you'll only see one backslash. – alexis Sep 14 '14 at 21:01

4 Answers4

2

You test your regex here https://regex101.com/, and then change it to python. Additionaly, to replace the matched group, you can use re.sub module on these lines:

re.sub(r"'([^']*)'", r'{\1}', col ) ) replace

'Protein_Expectation_Value_Log(e)', 'Protein_Intensity_Log(I)'

{Protein_Expectation_Value_Log(e)}, {Protein_Intensity_Log(I)}

More you can refer here

Anu
  • 3,198
  • 5
  • 28
  • 49
1

Actually it works:

>>> 'https://$1wikimediafoundation.org/'.replace('$', '\\')
'https://\\1wikimediafoundation.org/'
>>> print 'https://$1wikimediafoundation.org/'.replace('$', '\\')
https://\1wikimediafoundation.org/

when you are doing 'https://$1wikimediafoundation.org/'.replace('$', '\\'), it's returning the __repr__ (~representation) of the string and you can see special characters.

By printing it, you are using the __str__, the readable version. (See this answer on __str__ vs __repr__)

Community
  • 1
  • 1
fredtantini
  • 15,966
  • 8
  • 49
  • 55
  • My problem is that I would change the representation of the string, not the readable version, because I would parse this string as regular expression in the next step. – pointhi Sep 15 '14 at 14:53
1

try this:

'https://$1wikimediafoundation.org/'.replace('$', r'\')

adding r"\" whill automatically escape the backslash which you are trying to do.

shep
  • 513
  • 2
  • 5
  • 17
0

Note that $& in replacement patterns should be converted to \g<0>, since \0 is \0x00 character in python regex

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Jack Hack
  • 21
  • 1
  • 4