Deleting part of a string pandas DataFrame

Question

Background:
I have the following pandas Dataframe:

Objective:
Each field in the tweet column contains tweets (duh!). I am trying to do two things:

Delete all characters from the string before 'InSight'. So all tweets would begin 'InSight sol...'
Extract dates from the tweets (that are present just prior to 'InSight' and save these in a new column, named 'Date'.

What I've tried:
I've tried things such as split_string = tweets_df.split("InSight", 1) but I can't seem to write any code that is OK with splitting part of a string, but rather just a delimiter.

Any advice would be grately appreciated.

Always post your data as text and not as an image. Read https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Vishnudev Krishnadas, Jul 17 '20 at 03:09

Mateo Lara · Accepted Answer · 2020-07-17T03:25:16.490

0

Try using:

pandas.DataFrame.applymap Apply a function to a Dataframe elementwise.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

new_df = df.filter(['tweet']).applymap(lambda x: x[x.find('InSight'):])
dates_df = df.filter(['tweet']).applymap(lambda x: x[x.find('-') + 1:x.find('InSight')])

edited Jul 17 '20 at 03:25

answered Jul 17 '20 at 02:52

Mateo Lara

827
2
12
29

score 0 · Answer 2 · answered Jul 17 '20 at 02:58

You need to assign the trimmed column back to the original column instead of doing subsetting, and also the str.replace method doesn't seem to have the to_replace and value parameter. It has pat and repl parameter instead:

example:

df["Date"] = df["Date"].str.replace("\s:00", "")

df
#   ID       Date 
#0   1  8/24/1995
#1   2   8/1/1899

score 0 · Answer 3 · answered Jul 17 '20 at 03:39

To extract string after InSight you can use positive lookahead regex

df['text'] = df['tweet'].str.replace('.*(?=InSight)', '', regex=True)

To extract the date in the provided format, use str.extract with positive lookbehind regex

df['date'] = df['tweet'].str.extract('(?<=-)(\w{3} \d{2})')

Output

                                               tweet            text    date
0  Mars Weather@Marsweatherreport-Jul 15InSight s...  InSight sol 58  Jul 15

Deleting part of a string pandas DataFrame

3 Answers3

Linked