I need to url encode parts of the string that do not match a regex. Current solution (below) is:
- to select what regex I match (##.*##)
- put found substrings in a list and replace them with some not encodable indexes ~~1~~
- encode everything (entire url)
- put back the elements I found
I have this code that works. But I'm sure it could be done better, with a single parse looking for parts of the strings not matching my regex. It adds a huge overhead doing this everytime.
import re
from itertools import count
import urllib.parse
def replace_parts(url):
parts = []
counter = count(0)
def replace_to(match):
match = match.group(0)
parts.append(match)
return '~~' + str(next(counter)) + '~~'
def replace_from(match):
return parts[next(counter)]
url = re.sub(r'##(.*?)##', replace_to, url)
url = urllib.parse.quote(url)
counter = count(0)
url = re.sub(r'~~([0-9]+)~~', replace_from, url)
print (url)
url1 = "http://google.com?this_is_my_encodedurl##somethin##&email=##other##tr"
url = replace_parts(url1)
# this becomes http%3A%2F%2Fgoogle.com%3Fthis_is_my_encodedurl##somethin##%0A%26email%3D##other##tr