1

I'm trying to replace certain alphanumeric values in a column called 'Tags' which look somewhat like for example "PSPK01012L9_microsoft_abc" to "microsoft_abc"

I have tried multiple ways of doing it with regular expression but it's changing all the values in that entire string:

import re

s = dataframe['Tags']
dataframe['Tags'] = re.sub('[A-Za-z0-9_]*_microsoft_abc', 'microsoft_abc', str(s))
dataframe['Tags'] = re.sub('[A-Za-z0-9_]*_google_abc', 'google_abc', str(s))


It would be great if someone could help me out. Newbie in python here:(

desired output in my csv cloumn 'Tags' :

IAM~3rd                            
IAM~3rd, IAM~KI-000                 
IAM~1st                             
IAM~KI-000                          
IAM~3rd, IAM~KI-057                
microsoft_abc
google_abc

Current output with above regex:

dataframe['Tags'].value_counts()
0       0       microsoft_abc  google_abc\...\n1       0       microsoft_abc  google_abc\...\n2       0       microsoft_abc  google_abc\...\n3       0       microsoft_abc  google_abc\...\n4       0       microsoft_abc  google_abc\...\n                              ...                        \n4762    0       microsoft_abc  google_abc\...\n4763    0       microsoft_abc  google_abc\...\n4764    0       microsoft_abc  google_abc\...\n4765    0       microsoft_abc  google_abc\...\n4766    0       microsoft_abc  google_abc\...\nName: Tags, Length: 4767, dtype: object    4767

D C
  • 41
  • 6
  • does the string _always_ come after the underscore? – Umar.H Jan 12 '20 at 16:38
  • 1
    can you try `dataframe['Tags'].str.split('_',1).str[-1]` ? if all the tags are similar to `"PSPK01012L9_microsoft_abc"` , this should work – anky Jan 12 '20 at 16:53
  • @Datanovice yes – D C Jan 12 '20 at 17:07
  • @anky_91 it's working. Thank you so much:). is it possible to modify if I have two tags in one cell like "P12462LK_microsoft_abc, P12462LK_google_abc" Then the above method is showing as "abc, P12462LK_google_abc" – D C Jan 12 '20 at 18:22
  • This answer explains how to reuse matched groups: https://stackoverflow.com/questions/41472951/using-regex-matched-groups-in-pandas-dataframe-replace-function Also see doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html – Markus Rother Jan 13 '20 at 10:25

1 Answers1

0
dataframe['new_tags'] = dataframe['Tags'].apply(lambda x: ','.join(x.split('_')[1:]).replace(',','_'))

this snippet creates a new column with the desired output

Ananth Reddy
  • 299
  • 1
  • 5
  • 16