Python, remove all non-alphabet chars from string only for specified words

Question

I'm trying to replace certain alphanumeric values in a column called 'Tags' which look somewhat like for example "PSPK01012L9_microsoft_abc" to "microsoft_abc"

I have tried multiple ways of doing it with regular expression but it's changing all the values in that entire string:

import re

s = dataframe['Tags']
dataframe['Tags'] = re.sub('[A-Za-z0-9_]*_microsoft_abc', 'microsoft_abc', str(s))
dataframe['Tags'] = re.sub('[A-Za-z0-9_]*_google_abc', 'google_abc', str(s))

It would be great if someone could help me out. Newbie in python here:(

desired output in my csv cloumn 'Tags' :

IAM~3rd                            
IAM~3rd, IAM~KI-000                 
IAM~1st                             
IAM~KI-000                          
IAM~3rd, IAM~KI-057                
microsoft_abc
google_abc

Current output with above regex:

dataframe['Tags'].value_counts()

0       0       microsoft_abc  google_abc\...\n1       0       microsoft_abc  google_abc\...\n2       0       microsoft_abc  google_abc\...\n3       0       microsoft_abc  google_abc\...\n4       0       microsoft_abc  google_abc\...\n                              ...                        \n4762    0       microsoft_abc  google_abc\...\n4763    0       microsoft_abc  google_abc\...\n4764    0       microsoft_abc  google_abc\...\n4765    0       microsoft_abc  google_abc\...\n4766    0       microsoft_abc  google_abc\...\nName: Tags, Length: 4767, dtype: object    4767

can you try `dataframe['Tags'].str.split('_',1).str[-1]` ? if all the tags are similar to `"PSPK01012L9_microsoft_abc"` , this should work — anky, Jan 12 '20 at 16:53
@anky_91 it's working. Thank you so much:). is it possible to modify if I have two tags in one cell like "P12462LK_microsoft_abc, P12462LK_google_abc" Then the above method is showing as "abc, P12462LK_google_abc" — D C, Jan 12 '20 at 18:22
This answer explains how to reuse matched groups: https://stackoverflow.com/questions/41472951/using-regex-matched-groups-in-pandas-dataframe-replace-function Also see doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html — Markus Rother, Jan 13 '20 at 10:25

score 0 · Answer 1 · answered Jan 12 '20 at 16:29

0

dataframe['new_tags'] = dataframe['Tags'].apply(lambda x: ','.join(x.split('_')[1:]).replace(',','_'))

this snippet creates a new column with the desired output

answered Jan 12 '20 at 16:29

Ananth Reddy

299
1
5
16

AttributeError: 'NoneType' object has no attribute 'split' – D C Jan 12 '20 at 17:22
Please provide the sample of dataframe you are using. – Ananth Reddy Jan 12 '20 at 20:08

Python, remove all non-alphabet chars from string only for specified words

1 Answers1