Split regex strings into new column in pandas

Question

I have a redirect file in a pandas dataframe with a number of regex "or" expressions.

regex_no	regex
Regex4	/shop/accessories/jewellery/necklaces/(brand-\|)jon-richard/
Regex5	/shop/accessories/jewellery/(bracelets\|necklaces\|)/brand-simply-silver-by-jon-richard/
Regex245	/shop/(fashion/dresses/occasion-dresses\|)/bridesmaid/

I'm looking to build a testUrl column which builds both versions of the regex in a test url to run automated tests. It would look like this.

regex_no	regex	testUrl
Regex4	/shop/accessories/jewellery/necklaces/(brand-\|)jon-richard/	/shop/accessories/jewellery/necklaces/brand-jon-richard/
Regex4	/shop/accessories/jewellery/necklaces/(brand-\|)jon-richard/	/shop/accessories/jewellery/necklaces/jon-richard/
Regex5	/shop/accessories/jewellery/(bracelets\|necklaces\|)/brand-simply-silver-by-jon-richard/	/shop/accessories/jewellery/bracelets/brand-simply-silver-by-jon-richard/
Regex5	/shop/accessories/jewellery/(bracelets\|necklaces\|)/brand-simply-silver-by-jon-richard/	/shop/accessories/jewellery/brand-simply-silver-by-jon-richard/
Regex5	/shop/accessories/jewellery/(bracelets\|necklaces\|)/brand-simply-silver-by-jon-richard/	/shop/accessories/jewellery/necklaces/brand-simply-silver-by-jon-richard/
Regex245	/shop/(fashion/dresses/occasion-dresses/\|)bridesmaid/	/shop/fashion/dresses/occasion-dresses/bridesmaid/
Regex245	/shop/(fashion/dresses/occasion-dresses/\|)bridesmaid/	/shop/bridesmaid/

Unfortunately, I've no code to show how I would approach this, as it's slightly out of my knowledge capability. Thanks

If you have no code, how are you using pandas and seeing the dataframe? — Jaden Lorenc, Feb 03 '22 at 17:12
Point taken, I've amended the question. I have the code for the dataframe and other manipulations I've carried out, but wouldn't have been relevant to show for this question. — Stuart Houghton, Feb 03 '22 at 17:17
You should first work through the [Python tutorial](https://docs.python.org/3/tutorial/) if not done yet. — Michael Butscher, Feb 03 '22 at 17:17

score 0 · Answer 1 · answered Feb 03 '22 at 17:30

You can iterate through the rows of the dataframe like this, then use exrex to grab each possible result of your regex expressions.
You would need to construct a new dataframe, adding a new row for every possible result that exrex generates.
Might look something like (completely untested):

import pandas as pd
import exrex

df2 = pd.DataFrame(index = ['regex_no','regex','testUrl'])
for i in range(0, len(originalDataFrame)):
    for url in exrex.generate(originalDataFrame.iloc(i)['regex']):
        df2.append(originalDataFrame.iloc(i).concat(url))

Thanks, I'll give it a try – Stuart Houghton Feb 03 '22 at 17:39 — Stuart Houghton, Feb 03 '22 at 17:39

Split regex strings into new column in pandas

1 Answers1