1

This question is a followup to use values in dictionary to replace values in column. The following dataframe is a modication of the dataframe and dictionary used in use values in dictionary to replace values in column

import pandas as pd
dff= pd.DataFrame({'Data':['Hey BL171111 this is 123456 Jonny Good (511)2321134 1A1619',
                              'This is Jonny Good at 511-233-1137 A-1-24',
                          'Jonny Good and go way back in 03-15-2001',
                                  'Wow AL-17-1111 that is Alice Wonderland A999b dont 5643389 cool but NOT 1-2-2001',
                                  'Yes hi: Mick Mouse 1A25629Q88 or ',
                            'Bye Mick Mouse A13B ok was seen on 19S-9'], 
                          'E_ID': ['E11','E11','E11', 'E22', 'E33', 'E33'],
                           'N_ID' : ['111', '112', '113', '211', '311', '312'],
                           'Name' : ['JONNY GOOD', 'JONNY GOOD', 'JONNY GOOD', 
                                      'ALICE WONDERLAND',
                                      'MICK MOUSE', 'MICK MOUSE'],        
                          })

Here is the new dictionary

dd = {'E11': ['123456',
  'Jonny',
  'Good',
  '(511)2321134',
  '1A1619',
   'Jonny',
  'Good',
  '511-233-1137',
   'BL171111',
   'A-1-24',
   'Jonny',
  'Good',
  '03-15-2001'],

'E22': ['Alice',
  'Wonderland',
    'AL-17-1111',
  'A999b',
  '5643389',
  '1-2-2001'],

'E33': ['Mick', 
        'Mouse',
  '1A25629Q88',
        'Mick', 
        'Mouse',
  'A13B',
  '19S-9']}

When I apply the answer taken from use values in dictionary to replace values in column it seems to work well for what I am showing below. That is, all the values from dd, including the new values that I included e.g. '(511)2321134': '@@@' are being paired with a corresponding @@@

   d2 = {k: {x: '@@@' for x in v} for k, v in dd.items()}
   d2
   {'E11': {'(511)2321134': '@@@',
  '03-15-2001': '@@@',
  '123456': '@@@',
  '1A1619': '@@@',
  '511-233-1137': '@@@',
  'A-1-24': '@@@',
  'BL171111': '@@@',
  'Good': '@@@',
  'Jonny': '@@@'},
 'E22': {'1-2-2001': '@@@',
  '5643389': '@@@',
  'A999b': '@@@',
  'AL-17-1111': '@@@',
  'Alice': '@@@',
  'Wonderland': '@@@'},
 'E33': {'19S-9': '@@@',
  '1A25629Q88': '@@@',
  'A13B': '@@@',
  'Mick': '@@@',
  'Mouse': '@@@'}}

I also use the following code below taken from use values in dictionary to replace values in column

dff['New_Data'] = (dff.pivot(columns='E_ID', values='Data')
                .replace(d2, regex=True).bfill(1).iloc[:,0])
dff



        New_Data
0       Hey @@@ this is @@@ @@@ @@@ @@@ (511)2321134 @@@
1       This is @@@ @@@ @@@ at @@@ @@@
2       @@@ @@@ @@@ and go way back in @@@
3       Wow @@@ that is @@@ @@@ @@@ @@@ dont @@@ cool but NOT @@@
4       Yes hi: @@@ @@@ @@@ @@@ or
5       Bye @@@ @@@ @@@ @@@ ok was seen on @@@

But (511)2321134 is not turned into @@@. I am not sure why?

Ideally I would like the following

        New_Data
0       Hey @@@ this is @@@ @@@ @@@ @@@ @@@ @@@
1       This is @@@ @@@ @@@ at @@@ @@@
2       @@@ @@@ @@@ and go way back in @@@
3       Wow @@@ that is @@@ @@@ @@@ @@@ dont @@@ cool but NOT @@@
4       Yes hi: @@@ @@@ @@@ @@@ or
5       Bye @@@ @@@ @@@ @@@ ok was seen on @@@

It seems like

d2 = {k: {x: '@@@' for x in v} for k, v in dd.items()} is working well

so I am assuming the issue is in

(dff.pivot(columns='E_ID', values='Data')
                    .replace(d2, regex=True).bfill(1).iloc[:,0])

How do I alter the code to also get (511)2321134 into @@@ ?

1 Answers1

0

( and ) are special chars in regex, so you need to escape them in (511)2321134 into \\(511\\)2321134. Use my same solution in the other answer, just need additional maketrans and translate

tr = str.maketrans({"(": "\(", ")": "\)"})
d2 = {k: {x.translate(tr): '@@@' for x in v} for k, v in dd.items()}

Out[864]:
{'E11': {'123456': '@@@',
  'Jonny': '@@@',
  'Good': '@@@',
  '\\(511\\)2321134': '@@@',
  '1A1619': '@@@',
  '511-233-1137': '@@@',
  'BL171111': '@@@',
  'A-1-24': '@@@',
  '03-15-2001': '@@@'},
 'E22': {'Alice': '@@@',
  'Wonderland': '@@@',
  'AL-17-1111': '@@@',
  'A999b': '@@@',
  '5643389': '@@@',
  '1-2-2001': '@@@'},
 'E33': {'Mick': '@@@',
  'Mouse': '@@@',
  '1A25629Q88': '@@@',
  'A13B': '@@@',
  '19S-9': '@@@'}}

(dff.pivot(columns='E_ID', values='Data')
    .replace(d2, regex=True).bfill(1).iloc[:,0])

Out[867]:
0    Hey @@@ this is @@@ @@@ @@@ @@@ @@@
1    This is @@@ @@@ at @@@ @@@
2    @@@ @@@ and go way back in @@@
3    Wow @@@ that is @@@ @@@ @@@ dont @@@ cool but NOT @@@
4    Yes hi: @@@ @@@ @@@ or
5    Bye @@@ @@@ @@@ ok was seen on @@@
Name: E11, dtype: object
Andy L.
  • 24,909
  • 4
  • 17
  • 29