-1

I have a redirect file in a pandas dataframe with a number of regex "or" expressions.

regex_no regex
Regex4 /shop/accessories/jewellery/necklaces/(brand-|)jon-richard/
Regex5 /shop/accessories/jewellery/(bracelets|necklaces|)/brand-simply-silver-by-jon-richard/
Regex245 /shop/(fashion/dresses/occasion-dresses|)/bridesmaid/

I'm looking to build a testUrl column which builds both versions of the regex in a test url to run automated tests. It would look like this.

regex_no regex testUrl
Regex4 /shop/accessories/jewellery/necklaces/(brand-|)jon-richard/ /shop/accessories/jewellery/necklaces/brand-jon-richard/
Regex4 /shop/accessories/jewellery/necklaces/(brand-|)jon-richard/ /shop/accessories/jewellery/necklaces/jon-richard/
Regex5 /shop/accessories/jewellery/(bracelets|necklaces|)/brand-simply-silver-by-jon-richard/ /shop/accessories/jewellery/bracelets/brand-simply-silver-by-jon-richard/
Regex5 /shop/accessories/jewellery/(bracelets|necklaces|)/brand-simply-silver-by-jon-richard/ /shop/accessories/jewellery/brand-simply-silver-by-jon-richard/
Regex5 /shop/accessories/jewellery/(bracelets|necklaces|)/brand-simply-silver-by-jon-richard/ /shop/accessories/jewellery/necklaces/brand-simply-silver-by-jon-richard/
Regex245 /shop/(fashion/dresses/occasion-dresses/|)bridesmaid/ /shop/fashion/dresses/occasion-dresses/bridesmaid/
Regex245 /shop/(fashion/dresses/occasion-dresses/|)bridesmaid/ /shop/bridesmaid/

Unfortunately, I've no code to show how I would approach this, as it's slightly out of my knowledge capability. Thanks

1 Answers1

0

You can iterate through the rows of the dataframe like this, then use exrex to grab each possible result of your regex expressions.
You would need to construct a new dataframe, adding a new row for every possible result that exrex generates.
Might look something like (completely untested):

import pandas as pd
import exrex

df2 = pd.DataFrame(index = ['regex_no','regex','testUrl'])
for i in range(0, len(originalDataFrame)):
    for url in exrex.generate(originalDataFrame.iloc(i)['regex']):
        df2.append(originalDataFrame.iloc(i).concat(url))
Jaden Lorenc
  • 354
  • 3
  • 15