0

Given 2 different regex patterns, i want to find all occurrences of those 2 patters. If only pattern 1 matches then return that, if only pattern 2 matches then return that and if pattern 1 and pattern 2 matches then return both of them. So how do i run multiple(in this case 2 regex) in one statement?

Given input string :

"https://test.com/change-password?secret=12345;email=test@gmail.com;previous_password=hello;new=1"

I want to get the value of email and secret only. So i want the output as ['12345', 'test@gmail.com']

import re
print(re.search(r"(?<=secret=)[^;]+", s).group())
print(re.search(r"(?<=email=)[^;]+", s).group())

I am able to get the expected output by running the regex multiple times. How do i achieve it within a single statement? I dont want to run re.search 2 times. Can i achieve this within one search statement?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Austin
  • 135
  • 4
  • 17
  • Are you trying to combine the statements into one regex (I would not suggest this) or are you trying to search the string for each occurrence (if that's the case you likely want to use `findall` - see [this](https://stackoverflow.com/questions/9000960/python-regular-expressions-re-search-vs-re-findall) question)? It's not entirely clear. If you could provide expected output it would help. – ctwheels Sep 26 '19 at 18:17
  • @ctwheels My expected output is : **['12345', 'test@gmail.com']** – Austin Sep 26 '19 at 18:25
  • 3
    I would suggest using a proper parser, see [Retrieving parameters from a URL](https://stackoverflow.com/questions/5074803/retrieving-parameters-from-a-url) – ctwheels Sep 26 '19 at 18:26
  • Why oh why the regexes. As you can see they are a pain and this can be solved even with a simple `find`. Do not complicate your code unless necessary. – RickyA Sep 26 '19 at 19:17

4 Answers4

3
>>> re.findall(r"((?:(?<=email=)|(?<=secret=))[^;]+)", s)
['12345', 'test@gmail.com']

But now you'll need a way of identifying which of the resulting values is the secret and which is the email. I'd recommend also extracting this information with the regex (which also eliminates the lookbehind):

>>> dict(kv.split('=') for kv in re.findall(r"((?:secret|email)=[^;]+)", s))
{'secret': '12345', 'email': 'test@gmail.com'}
evnp
  • 191
  • 1
  • 6
1
import re

print(re.findall("(?<=secret=)[^;]+|(?<=email=)[^;]+", s))

# output
# ['12345', 'test@gmail.com']
zihaozhihao
  • 4,197
  • 2
  • 15
  • 25
NybbleStar
  • 71
  • 5
1

You could use a dict comprehension:

import re
url = "https://test.com/change-password?secret=12345;email=test@gmail.com;previous_password=hello;new=1"

rx = re.compile(r'(?P<key>\w+)=(?P<value>[^;]+)')

dict_ = {m['key']: m['value'] for m in rx.finditer(url)}

# ... then afterwards ...
lst_ = [value for key in ("secret", "email") if key in dict_ for value in [dict_[key]]]
print(lst_)
# ['12345', 'test@gmail.com']
Jan
  • 42,290
  • 8
  • 54
  • 79
0

So i ended up using the urllib as suggested by @ctwheels

url_exclude = ["email", "secret"]
import urllib.parse as urlparse
from urllib.parse import urlencode, urlunparse
url_parsed_string = urlparse.urlparse(input_string)
parsed_columns = urlparse.parse_qs(url_parsed_string.query)
for exclude_column in url_exclude:
    if exclude_column in parsed_columns:
        parsed_columns[exclude_column] = "xxxxxxxxxx"
qstr = urlencode(parsed_columns)
base_url = urlunparse((url_parsed_string.scheme, url_parsed_string.netloc, 
url_parsed_string.path, url_parsed_string.params, qstr, 
url_parsed_string.fragment))
print(base_url)
Austin
  • 135
  • 4
  • 17