0

This was originally marked a duplicate by someone but this is in relation to pandas, so different than what it was marked as a duplicate for. I am trying to use re.sub to remove the first occurrence per pandas cell of a string that matches my list.
I have:

import pandas as pd
import re

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "hello kitty hello",
            "hello puppy",
            "it is an helloexample",
            "for stackoverflow",
            "hello world",
        ],
    }
)

strings_to_remove = ["hello", "for", "an"]

I want an output like:

df2 = pd.DataFrame(
    {
      'ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
      'name': {0: ' kty hello',
       1: ' puppy',
       2: ' is  example',
       3: ' stackoverflow',
       4: ' world'}}
)

Notice how only the first occurrence of hello is removed from df2 under the 'name' column for each cell.
Looking to use something like re.sub but not sure how to get the code to only remove the first occurrence of 'hello' within each cell. Any ideas?

codingInMyBasement
  • 728
  • 1
  • 6
  • 20

1 Answers1

1
  • You can pass how many times you want to substitute value in re.sub function
import pandas as pd
import re

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "hello kitty hello",
            "hello puppy",
            "it is an helloexample",
            "for stackoverflow",
            "hello world",
        ],
    }
)

strings_to_remove = ["hello", "for", "an", "it"]


for word in strings_to_remove:
    df['name'] = df['name'].apply(lambda x: re.sub(word,'',x,1))

df

output:

    ID  name
0   1   kty hello
1   2   puppy
2   3   is example
3   4   stackoverflow
4   5   world
Poojan
  • 3,366
  • 2
  • 17
  • 33