Find all occurrences of multiple regex conditions using python regex

Question

Given 2 different regex patterns, i want to find all occurrences of those 2 patters. If only pattern 1 matches then return that, if only pattern 2 matches then return that and if pattern 1 and pattern 2 matches then return both of them. So how do i run multiple(in this case 2 regex) in one statement?

Given input string :

"https://test.com/change-password?secret=12345;email=test@gmail.com;previous_password=hello;new=1"

I want to get the value of email and secret only. So i want the output as ['12345', 'test@gmail.com']

import re
print(re.search(r"(?<=secret=)[^;]+", s).group())
print(re.search(r"(?<=email=)[^;]+", s).group())

I am able to get the expected output by running the regex multiple times. How do i achieve it within a single statement? I dont want to run re.search 2 times. Can i achieve this within one search statement?

Are you trying to combine the statements into one regex (I would not suggest this) or are you trying to search the string for each occurrence (if that's the case you likely want to use `findall` - see [this](https://stackoverflow.com/questions/9000960/python-regular-expressions-re-search-vs-re-findall) question)? It's not entirely clear. If you could provide expected output it would help. — ctwheels, Sep 26 '19 at 18:17
@ctwheels My expected output is : **['12345', 'test@gmail.com']** — Austin, Sep 26 '19 at 18:25
I would suggest using a proper parser, see [Retrieving parameters from a URL](https://stackoverflow.com/questions/5074803/retrieving-parameters-from-a-url) — ctwheels, Sep 26 '19 at 18:26
Why oh why the regexes. As you can see they are a pain and this can be solved even with a simple `find`. Do not complicate your code unless necessary. — RickyA, Sep 26 '19 at 19:17

score 3 · Answer 1 · answered Sep 26 '19 at 18:30

>>> re.findall(r"((?:(?<=email=)|(?<=secret=))[^;]+)", s)
['12345', 'test@gmail.com']

But now you'll need a way of identifying which of the resulting values is the secret and which is the email. I'd recommend also extracting this information with the regex (which also eliminates the lookbehind):

>>> dict(kv.split('=') for kv in re.findall(r"((?:secret|email)=[^;]+)", s))
{'secret': '12345', 'email': 'test@gmail.com'}

score 1 · Answer 2 · edited Sep 26 '19 at 18:33

1

import re

print(re.findall("(?<=secret=)[^;]+|(?<=email=)[^;]+", s))

# output
# ['12345', 'test@gmail.com']

edited Sep 26 '19 at 18:33

zihaozhihao

4,197
2
15
25

answered Sep 26 '19 at 18:30

NybbleStar

71
5

Jan · Answer 3 · 2019-09-26T18:42:55.213

You could use a dict comprehension:

import re
url = "https://test.com/change-password?secret=12345;email=test@gmail.com;previous_password=hello;new=1"

rx = re.compile(r'(?P<key>\w+)=(?P<value>[^;]+)')

dict_ = {m['key']: m['value'] for m in rx.finditer(url)}

# ... then afterwards ...
lst_ = [value for key in ("secret", "email") if key in dict_ for value in [dict_[key]]]
print(lst_)
# ['12345', 'test@gmail.com']

score 0 · Answer 4 · answered Sep 27 '19 at 21:58

So i ended up using the urllib as suggested by @ctwheels

url_exclude = ["email", "secret"]
import urllib.parse as urlparse
from urllib.parse import urlencode, urlunparse
url_parsed_string = urlparse.urlparse(input_string)
parsed_columns = urlparse.parse_qs(url_parsed_string.query)
for exclude_column in url_exclude:
    if exclude_column in parsed_columns:
        parsed_columns[exclude_column] = "xxxxxxxxxx"
qstr = urlencode(parsed_columns)
base_url = urlunparse((url_parsed_string.scheme, url_parsed_string.netloc, 
url_parsed_string.path, url_parsed_string.params, qstr, 
url_parsed_string.fragment))
print(base_url)

Find all occurrences of multiple regex conditions using python regex

4 Answers4