I am new to Python programming. I am trying to split a column from a dataframe df
and add it as a new column to the same df
. Below is a sample reproducible code for reference.
import pandas as pd
import datetime
# Create new dataframe df
df = pd.DataFrame({'sample_date':['2018-01-10','2018-01-10','2018-01-11','2018-01-11','2018-01-12']})
# Get current date. It is 2018-01-12 for me as I write this
today = datetime.date.today()
# Add new column to the df dataframe
df['Today'] = today
# Converting all columns to datetime dtype
df['sample_date'] = pd.to_datetime(df['sample_date'])
df['Today'] = pd.to_datetime(df['Today'])
# Creating a new column to get difference of Today and sample_date column
df['Difference'] = df['Today'] - df['sample_date']
When I write the df file to system as csv or txt, I can see the output as shown below.
sample_date Today Difference
10-01-2018 12-01-2018 2 days 00:00:00.000000000
10-01-2018 12-01-2018 2 days 00:00:00.000000000
11-01-2018 12-01-2018 1 days 00:00:00.000000000
11-01-2018 12-01-2018 1 days 00:00:00.000000000
12-01-2018 12-01-2018 0 days 00:00:00.000000000
I Want to add a new column 'Day'
to the same dataframe df
by splitting the 'Difference'
column in such a way that only the values before days are captured. Something like the one shown below.
# Desired output
sample_date Today Difference Day
10-01-2018 12-01-2018 2 days 00:00:00.000000000 2
10-01-2018 12-01-2018 2 days 00:00:00.000000000 2
11-01-2018 12-01-2018 1 days 00:00:00.000000000 1
11-01-2018 12-01-2018 1 days 00:00:00.000000000 1
12-01-2018 12-01-2018 0 days 00:00:00.000000000 0
I have tried using the str.split()
option using the solution provided in this thread - How to split a column into two columns?. But I am getting an error and not able to figure out what I am doing wrong. Is there any way I can get the desired output? I am using Python 3.6.4. Any help would be appreciated.
UPDATE:
I tried the solution provided by @Jonas Byström but the output is not what I am looking for. Any Idea what I may be doing wrong?
# Trying a Solution
df['Difference'] = str(df['Today'] - df['sample_date']).split()[0]
# Output received
sample_date Today Difference
0 2018-01-10 2018-01-12 0
1 2018-01-10 2018-01-12 0
2 2018-01-11 2018-01-12 0
3 2018-01-11 2018-01-12 0
4 2018-01-12 2018-01-12 0