remove similar words in pandas dataframe

Question

I have a dataframe where I want to remove occurrences of "XXXX" in any form since my data has the occurrence of this word in many ways. For example my dataframe looks something like this

[ 'XXXX/XXXX/16', '{', '$', '39.00', '}', 'XXXX/XXXX/2016', '.', 'excessive', 'charges', 'would', 'like', 'dispute', '.'] 'XX/XX/XXXX', 'date', 'last', 'payment', ',', 'last', 'payment', 'made', 'XX/XX/XXXX'] ['Collector', 'calls', 'non', 'stop', '.', 'XXXX/XXXX/15' 'Med', 'XXXXXXXX', '{', '$', '290.00', '}', 'XX/XX/XXXX-XX/XX/XXXX']

Desired output should remove all the occurence of "XX" in any form as given above.

The code that I have used here is

stop =  ['XXXX', "XX/XX"]
df['issue_detail'] = df['issue_detail'].apply(lambda x: [item for item in x if item not in stop])

The above code is just removing the occurence of "XXXX: from the pandas data frame but how should u remove rest of the XXXX occurrence which are in different forms as above

Do you want to fully remove the rows if anything in stop appears in those rows? Or are you removing those substrings from every row? — ALollz, Jan 27 '19 at 18:38
It's not clear how "XXXX" varies in your example. Did you mean that "XXXX" can have intervening characters? It would help if you can show us an example of the input and the expected output after removal of "XXXX" — kentwait, Jan 27 '19 at 18:39
@ALollz.. i just want to remove any occurence where we have "XXXX" or similar in the dataframe — Swati Kanchan, Jan 27 '19 at 18:43

score 0 · Accepted Answer · answered Jan 27 '19 at 18:41

0

It seems like you're looking for regular expressions. If I understand your problem correctly, this question is very much related to what you're asking.

Create regular expression
Apply df.column_name.str.match on the dataframe. This will create a dataframe containing True and False for each row.
Filter the dataframe based on the matching done in the previous step.

Have a look at this specific answer to see the related code.

answered Jan 27 '19 at 18:41

bartcode

589
4
14

@bartcode...after reading the last given i understand i will have to give the list of all the occurences of XX to pass but since there are 1000's of difference occurences of XX this becomes very difficult to do – Swati Kanchan Jan 27 '19 at 18:55
It's unclear whether you literally mean "XXX" or some other word. But if you mean literally "XXXX", "XXX/XX/XX", and "XXX/XX/X/15", you could create an expression such as `(X+\/?)+([0-9]+)?`. Take a look at [regex101](https://regex101.com/) to validate your regular expression. – bartcode Jan 27 '19 at 19:31

remove similar words in pandas dataframe

1 Answers1