1

I'm trying to affilliateize some legacy text in a Django webapp. It's a pretty simple scope. The text has some amazon URLs in and I want to munge my ?tag=xxx identifier onto the end of them.

I've written a template filter that I can quickly pass my text through but I'm slightly stuck on writing the regex logic.

t = text_from_template_engine
return re.sub(r'(https?://(?:www\.)?amazon\.co\.uk[\S]+)', r'\\\1?tag=xxx', t)

This seems to work on a very basic level but if the URL already has a querystring (as lots of organic Amazon URLs do by default), I would need an ampersand instead of a question mark.

There might be a quick way to detect two question marks and replace the second. I'm open to that suggestion.

What I'm really looking for is a regex-replace where I can pass the found string off to another method (in which I can detect existing question marks) that is expected to return the replacement string. Something like PHP's preg_replace_callback (et al). Does that exist?

Oli
  • 235,628
  • 64
  • 220
  • 299

3 Answers3

2

Yes, the second parameter to re.sub can be a function, which takes a match object and returns a string. See the documentation.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
0

There might be a quick way to detect two question marks and replace the second. I'm open to that suggestion.

this will get the 2 question marks.

(\?|(\?\?))

I believe the non-passive group number for the double question mark will be $4 but you'll need to double check that and you can add back the single question mark in your replace.

Keng
  • 52,011
  • 32
  • 81
  • 111
0

Once you find the URL, you're better off parsing it properly rather than hacking it into a regex.

Community
  • 1
  • 1
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 1
    Sure. But that's my problem. How do I find-and-replace in a way that lets me interfere in the middle? – Oli Nov 30 '10 at 15:01