Search for specific keyword in pandas and then edit the cell

Question

I am trying to edit my pandas dataframe based on some specifications. I need a certain layout of my cells in order for my program to work. Currently, my data looks something like this:

    x      y
A   1  information
B   2  information and some stuff
C   3  information and random stuff

But I need it to look like this:

    x      y
A   1  information
B   2  information
C   3  information

So basically, it needs to scan through every cell and if check for a keyword ("and" in my example). Then it needs to delete everything after the keyword, including the keyword, leaving only the important information behind.

I currently just can't wrap my head around an efficient way to do this. Any help is appreciated

*"it needs to scan through every cell..."* No it doesn't, it only needs to search the string column(s), 'y'. So your code will simply be `df['y'] = df['y'].str.replace(pattern, replacement)`. The rest is you figuring out which regex to use. See doc for [`str.replace`](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html) — smci, Aug 30 '21 at 20:06
...and if you want to select *all* string columns in your dataframe, use `df.select_dtypes('string')`. See [this](https://stackoverflow.com/questions/64374660/apply-transformation-only-on-string-columns-with-pandas-ignoring-numeric-data) — smci, Aug 30 '21 at 20:09

score 0 · Accepted Answer · answered Aug 30 '21 at 19:53

0

You can access the y column and use the .str API to search for and replace everything after the word 'and'.

df.y = df.y.str.replace(r' and .*', '')

answered Aug 30 '21 at 19:53

James

32,991
4
47
70

SeaBean · Answer 2 · 2021-08-30T20:04:59.750

0

You can split the string with the keyword by str.split(), then take the part of substring on the left by .str[0]:

df['y'] = df['y'].str.split(' and').str[0]

Result:

print(df)

   x            y
A  1  information
B  2  information
C  3  information

edited Aug 30 '21 at 20:04

answered Aug 30 '21 at 19:55

SeaBean

22,547
3
13
25

score 0 · Answer 3 · answered Aug 30 '21 at 20:02

You can use string.split(" keyword ") to break up the string into a list.

import pandas as pd

# create the df to work with:
df = pd.DataFrame(
    {
        "x": [
            1,
            2,
            3,
        ],
        "y": [
            "information",
            "information and some stuff",
            "information and random stuff"
        ]
    }
)


for index in df.index:  # loop over each line
    current_line = df.loc[index, "y"]  # get current line as string
    current_line_list = current_line.split(" and ") # create a list. Example: ['information', 'some stuff']
    current_line = df.loc[index, "y"] = current_line_list[0]  # the first element will be information

Result:

print(df)

   x            y
0  1  information
1  2  information
2  3  information

Search for specific keyword in pandas and then edit the cell

3 Answers3