Remove only first occurrence of a pandas cell that matches strings in a list

Question

This was originally marked a duplicate by someone but this is in relation to pandas, so different than what it was marked as a duplicate for. I am trying to use re.sub to remove the first occurrence per pandas cell of a string that matches my list.
I have:

import pandas as pd
import re

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "hello kitty hello",
            "hello puppy",
            "it is an helloexample",
            "for stackoverflow",
            "hello world",
        ],
    }
)

strings_to_remove = ["hello", "for", "an"]

I want an output like:

df2 = pd.DataFrame(
    {
      'ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
      'name': {0: ' kty hello',
       1: ' puppy',
       2: ' is  example',
       3: ' stackoverflow',
       4: ' world'}}
)

Notice how only the first occurrence of hello is removed from df2 under the 'name' column for each cell.
Looking to use something like re.sub but not sure how to get the code to only remove the first occurrence of 'hello' within each cell. Any ideas?

thanks for pointing this out. Just remove it. It should be removed too. — codingInMyBasement, Nov 20 '19 at 20:28
Your question is confusing. You said each occurrence of `'hello'` only should be removed. Can you please clarify what will happen in this case? `'it an for hello'`. What will be output for this? — Poojan, Nov 20 '19 at 20:29
@acodejdatam __"but not sure how to get the code to only remove the first occurrence of 'hello' within each cell."__ — Poojan, Nov 20 '19 at 20:30
You need to review your sample/output. Why `hello` in `ID 3` is removed but `it` in `kitty` of `ID 1` is not. — Quang Hoang, Nov 20 '19 at 20:31
"You said each occurrence of" - is what you said @Poojan. I said 'first occurrence'. — codingInMyBasement, Nov 20 '19 at 20:33
Will `kitty` become `kty`? after removing `it`? Your output is full of missmatch of what you are stating. Please clarify your output better. I dont understand how the fk this question is closed. — Poojan, Nov 20 '19 at 20:35
Then why `hello kitty hello` output is `kitty hello` and not `kty hello`? — Poojan, Nov 20 '19 at 20:37

Poojan · Accepted Answer · 2019-11-20T20:38:31.440

1

You can pass how many times you want to substitute value in re.sub function

import pandas as pd
import re

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "hello kitty hello",
            "hello puppy",
            "it is an helloexample",
            "for stackoverflow",
            "hello world",
        ],
    }
)

strings_to_remove = ["hello", "for", "an", "it"]


for word in strings_to_remove:
    df['name'] = df['name'].apply(lambda x: re.sub(word,'',x,1))

df

output:

    ID  name
0   1   kty hello
1   2   puppy
2   3   is example
3   4   stackoverflow
4   5   world

edited Nov 20 '19 at 20:38

answered Nov 20 '19 at 20:26

Poojan

3,366
2
17
33

Thank you! This is what I was looking for. I updated the question as well. – codingInMyBasement Nov 20 '19 at 20:40
1

copy past above code and run. Its working as expected for me. – Poojan Nov 20 '19 at 20:45

Remove only first occurrence of a pandas cell that matches strings in a list

1 Answers1