python pandas use map with regular expressions

Question

I have a dict:

dealer = {
    'ESSELUNGA': 'Spesa',
    'DECATHLON 00000120': 'Sport',
    'LEROY MERLIN': 'Casa',
    'CONAD 8429': 'Spesa',
    'IKEA': 'Casa',
    'F.LLI MADAFFARI': 'Spesa',
    'SUPERMERCATO IL GIGANT': 'Spesa',
    'NATURASI SPA': 'Spesa',
    'ESSELUNGA SETTIMO MILANE': 'Spesa'
}

and I want to map it to a pandas df:

entries.Categoria = entries.Commerciante.map(dealer)

Is there a way to use regex to match map on "Commerciante" column? In this way I can rewrite dealer as this:

dealer = {
    'ESSELUNGA': 'Spesa',
    'DECATHLON': 'Sport',
    'LEROY MERLIN': 'Casa',
    'CONAD': 'Spesa',
    'IKEA': 'Casa',
    'F.LLI MADAFFARI': 'Spesa',
    'SUPERMERCATO IL GIGANT': 'Spesa',
    'NATURASI SPA': 'Spesa',
    'ESSELUNGA SETTIMO MILANE': 'Spesa'
}

and match both "DECATHLON" and "DECATHLON 00000120"

score 2 · Accepted Answer · answered May 13 '15 at 04:53

Thank you to all of you. I used your suggestions to resolve my problem. I defined a new function:

def dealer_replace(dealer_dict, text):

    regex = re.compile("(%s)" % "|".join(map(re.escape, dealer_dict.keys())))

    if regex.search(text):
        ret = regex.search(text)
        return dealer_dict[ret.group()]
    else:
        return None

And use it with apply

entries['Categoria'] = entries['Commerciante'].apply(lambda v: dealer_replace(dealer, str(v)))

score 1 · Answer 2 · answered May 12 '15 at 06:41

One can use a dict comprehension with a regular expression to rewrite key. The re python module is used to perform this task, with the command sub. The substitution key looks like:

import re
dealer = {re.sub(r'(\W)[0-9]+',r'\1',k).strip():dealer[k] for k in dealer}

The whole example gives:

import re
dealer = {
    'ESSELUNGA': 'Spesa',
    'DECATHLON 00000120': 'Sport',
    'LEROY MERLIN': 'Casa',
    'CONAD 8429': 'Spesa',
    'IKEA': 'Casa',
    'F.LLI MADAFFARI': 'Spesa',
    'SUPERMERCATO IL GIGANT': 'Spesa',
    'NATURASI SPA': 'Spesa',
    'ESSELUNGA SETTIMO MILANE': 'Spesa'
}
dealer = {re.sub(r'(\W)[0-9]+',r'\1',k).strip():dealer[k] for k in dealer}

score 1 · Answer 3 · answered May 12 '15 at 21:03

I think your problem is that you are trying to do two things in one step.

First clean your data, then map it.

pandas Series holds lots of nice string functions which can come in handy for cleaning your data. Here is a good reference to the string methods.

Once you have used the string methods for cleaning your data, mapping it will be easy as pie.

score 1 · Answer 4 · answered May 17 '23 at 14:36

Another approach is using df.replace. If you set the DECATHLON key as a regex like r'^DECATHLON.*' in dealer you could do something like this,

dealer = {
    'ESSELUNGA': 'Spesa', 
    r'DECATHLON.*': 'Sport',
    'LEROY MERLIN': 'Casa',
    'CONAD 8429': 'Spesa',
    'IKEA': 'Casa',
    'F.LLI MADAFFARI': 'Spesa',
    'SUPERMERCATO IL GIGANT': 'Spesa',
    'NATURASI SPA': 'Spesa',
    'ESSELUNGA SETTIMO MILANE': 'Spesa'
}

df['Commerciante'] = df['Commerciante'].replace(regex=dealer)

score 0 · Answer 5 · answered May 12 '15 at 08:32

Why don't you use apply and on a modified dictionary lookup:

In [14]: [dname for dname in dealer if 'DECATHLON' in dname]
Out[14]: ['DECATHLON 00000120']

And, apply it like this -

df['Commerciante'] = df['Commerciante'].apply(lambda v: [dname for dname in dealer if dname.startswith('DECATHLON')][0])

python pandas use map with regular expressions

5 Answers5

Linked