1

I'm trying to do a search and replace (for multiple chars) in the following string:

VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&

One or more of these characters: %3D, %2F, %2B, %23, can be found anywhere (beginning, middle, or end of the string) and ideally, I'd like to search for all of them at once (using one regex) and replace them with = or / or + or # respectively, then return the final string.

Example 1:

VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&

Should return

VAR=/lkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA+7G3e8=&

Example 2:

VAR=s2P0n6I%2Flonpj6uCKvYn8PCjp%2F4PUE2TPsltCdmA%3DRQPY%3D&

Should return

VAR=s2P0n6I/lonpj6uCKvYn8PCjp/4PUE2TPsltCdmA=RQPY=&
Jerry
  • 70,495
  • 13
  • 100
  • 144
My PM
  • 11
  • 1
  • 2
  • 3
    What language are you doing this in? Your language probably has a native way to urldecode. – Joseph Silber Sep 15 '13 at 20:18
  • I'm pretty sure regex can't do this. – Bernhard Barker Sep 15 '13 at 20:21
  • A single regex? You'll need a function to pick which set of chars to replace with the corresponding set of chars as well. – Jerry Sep 15 '13 at 20:21
  • Python, BUT what I'm looking for is a simple expression, not lines of code or scripts. Something like this:'VAR':\s*'([^']*)' something that will step through the string, find and replace those characters. I found this, but I don't know how to apply it to my case: http://stackoverflow.com/questions/14059129/search-and-replace-with-regex-with-replace-variables – My PM Sep 15 '13 at 20:24
  • That question is different. The variables replaced are already in the string, while yours aren't. – Jerry Sep 15 '13 at 20:29
  • You're correct :( One has to have some sort of look-up table, then find thoses chars and replace them. Any idea how to do that? Thanks again – My PM Sep 15 '13 at 20:31
  • Are those the only characters that need to be replaced? – Explosion Pills Sep 15 '13 at 20:31
  • so far yes, those are the only ones I've seen. – My PM Sep 15 '13 at 20:34
  • Why regex? Is there some special reason? – Ofir Israel Sep 15 '13 at 20:38
  • That's what the plugin uses. regex/python – My PM Sep 15 '13 at 20:42
  • A regex replace can't do it natively, but you may be able to put code in the replacement that uses a match variable to look up the captured sequence in a hash that maps each sequence to its replacement. I don't know Python, but here's a solution in Perl (which I gave to someone who asked how to do it in PowerShell, because PS can't do this): http://stackoverflow.com/questions/17702046/powershell-multiple-string-replacement-efficiency/17715700#17715700. If Python has the same capability, you may be able to reproduce this solution. But I'm sure there's also a method to decode URL strings. :) – Adi Inbar Sep 15 '13 at 20:51

5 Answers5

2

I'm not convinced you need regex for this, but it's fairly easy to do with Python:

x = 'VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&'

import re

MAPPING = { 
    '%3D': '=',
    '%2F': '/',
    '%2B': '+',
    '%23': '#',
}

def replace(match):
    return MAPPING[match.group(0)]

print x
print re.sub('%[A-Z0-9]{2}', replace, x)

Output:

VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&
VAR=/lkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA+7G3e8=&
Wolph
  • 78,177
  • 11
  • 137
  • 148
  • That's great and I could follow what you did, but is there a way to do this using some slick regex or is it just not possible? Cheers Mate! – My PM Sep 15 '13 at 20:51
  • Look at my answer, is that what you call "slick"? – Ofir Israel Sep 15 '13 at 20:58
  • @MyPM: since there is no simple metohd to convert the hex numbers into characters with the Python regex engine, no. You will always need to use a function like Ofir and I did or use multiple replaces like CodeChords did. – Wolph Sep 15 '13 at 21:16
2

There is no need for a regex to do that in your example. A simple replace method will do:

def rep(s):
    for pat, txt in [['%2F','/'], ['%2B','+'], ['%3D','='], ['%23','#']]:
        s = s.replace(pat, txt)
    return s
Israel Unterman
  • 13,158
  • 4
  • 28
  • 35
2

I'm also not convinced you need regex, but there's a better way to do url-decode with regex. Basically you need that every string in the pattern of %XX will be converted into the char it represents. This can be done with re.sub() like so:

>>> VAR="%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&"
>>> re.sub(r'%..', lambda x: chr(int(x.group()[1:], 16)), VAR)
'/lkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA+7G3e8=&'

Enjoy.

Ofir Israel
  • 3,785
  • 2
  • 15
  • 13
1
var = "VAR=s2P0n6I%2Flonpj6uCKvYn8PCjp%2F4PUE2TPsltCdmA%3DRQPY%3D&"
var = var.replace("%2F", "/")
var = var.replace("%2B", "+")
var = var.replace("%3D", "=")

but you got same result with urllib2.unquote

import urllib2
var = "VAR=s2P0n6I%2Flonpj6uCKvYn8PCjp%2F4PUE2TPsltCdmA%3DRQPY%3D&"
var = urllib2.unquote(var)
Sérgio
  • 6,966
  • 1
  • 48
  • 53
-1

This can't be done with a regex because there's no way to write any kind of conditional inside of a regex. Regular expressions can only answer the question "Does this string match this pattern?" and not perform the operation "If this string matches this pattern, replace part of it with this. If it matches this pattern, replace it with this. etc..."

Patrick Collins
  • 10,306
  • 5
  • 30
  • 69