Splitting column of dataframe to a new column in same dataframe in Pandas

Question

I am new to Python programming. I am trying to split a column from a dataframe df and add it as a new column to the same df. Below is a sample reproducible code for reference.

import pandas as pd
import datetime

# Create new dataframe df
df = pd.DataFrame({'sample_date':['2018-01-10','2018-01-10','2018-01-11','2018-01-11','2018-01-12']})

# Get current date. It is 2018-01-12 for me as I write this
today = datetime.date.today()

# Add new column to the df dataframe
df['Today'] = today

# Converting all columns to datetime dtype
df['sample_date'] = pd.to_datetime(df['sample_date'])
df['Today'] = pd.to_datetime(df['Today'])

# Creating a new column to get difference of Today and sample_date column
df['Difference'] = df['Today'] - df['sample_date']

When I write the df file to system as csv or txt, I can see the output as shown below.

sample_date Today       Difference
10-01-2018  12-01-2018  2 days 00:00:00.000000000
10-01-2018  12-01-2018  2 days 00:00:00.000000000
11-01-2018  12-01-2018  1 days 00:00:00.000000000
11-01-2018  12-01-2018  1 days 00:00:00.000000000
12-01-2018  12-01-2018  0 days 00:00:00.000000000

I Want to add a new column 'Day' to the same dataframe df by splitting the 'Difference' column in such a way that only the values before days are captured. Something like the one shown below.

# Desired output
sample_date Today       Difference                  Day
10-01-2018  12-01-2018  2 days 00:00:00.000000000   2
10-01-2018  12-01-2018  2 days 00:00:00.000000000   2
11-01-2018  12-01-2018  1 days 00:00:00.000000000   1
11-01-2018  12-01-2018  1 days 00:00:00.000000000   1
12-01-2018  12-01-2018  0 days 00:00:00.000000000   0

I have tried using the str.split() option using the solution provided in this thread - How to split a column into two columns?. But I am getting an error and not able to figure out what I am doing wrong. Is there any way I can get the desired output? I am using Python 3.6.4. Any help would be appreciated.

UPDATE:

I tried the solution provided by @Jonas Byström but the output is not what I am looking for. Any Idea what I may be doing wrong?

# Trying a Solution
df['Difference'] = str(df['Today'] - df['sample_date']).split()[0]

# Output received

  sample_date      Today Difference
0  2018-01-10 2018-01-12          0
1  2018-01-10 2018-01-12          0
2  2018-01-11 2018-01-12          0
3  2018-01-11 2018-01-12          0
4  2018-01-12 2018-01-12          0

I think you need `df['Day'] = df['Difference'].dt.days`, it looks like dupe :( — jezrael, Jan 12 '18 at 09:09
Thanks @jesrael. +1 for your solution. I actually tried your solution too using `df['Day'] = df['Difference'].dt.days`. I got exactly the same output as I desired. I actually tried to get more information on `dt.days` from the pandas documentation [link](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.days.html). But there is not much detail provided there. It would be really great if you could tell me more about this function. — Code_Sipra, Jan 12 '18 at 12:59
Yes, sure. First thing is for difference of 2 datetiems get [timedeltas](http://pandas.pydata.org/pandas-docs/stable/timedeltas.html) and then is the best use function `dt.days` for convert timedeltas to intergrs as days. Good luck1 — jezrael, Jan 12 '18 at 13:01
Great, thanks again. I can see `dt.days` is returning an `int64` dtype which is awesome! I needed it to be in this dtype for further processing as is. I would really appreciate it if you could provide the same detail as an answer for rest of the world. Someone out there may be stuck with a similar problem as I did. — Code_Sipra, Jan 12 '18 at 13:09
I really like do it, but I have problem - this question was already answered before :( So I cannot create answer. So sorry :( — jezrael, Jan 12 '18 at 13:10
Respect, sir!! Thank you. You taught me something new in Python today. Have a good day. — Code_Sipra, Jan 12 '18 at 13:17

Jonas Byström · Answer 1 · 2018-01-23T07:30:34.493

0

Edit:

>>> diff = df['Today'] - df['sample_date']
>>> first_word = lambda td: str(td).split()[0]
>>> df['Difference'] = [first_word(e) for e in diff]
>>> df
  sample_date      Today Difference
0  2018-01-10 2018-01-23         13
1  2018-01-10 2018-01-23         13
2  2018-01-11 2018-01-23         12
3  2018-01-11 2018-01-23         12
4  2018-01-12 2018-01-23         11

Or if you prefer a one-liner:

df['Difference'] = [str(e).split()[0] for e in df['Today'] - df['sample_date']]

edited Jan 23 '18 at 07:30

answered Jan 12 '18 at 09:09

Jonas Byström

25,316
23
100
147

Thanks for taking the time to help out @Jonas Byström. I tried your solution. Unfortunately, I am not getting the desired result. I have updated by question with this information too for reference if it helps as I am not sure how to provide a dataframe output in the comment section. – Code_Sipra Jan 12 '18 at 12:50

Splitting column of dataframe to a new column in same dataframe in Pandas

1 Answers1