0

I am new to Python programming. I am trying to split a column from a dataframe df and add it as a new column to the same df. Below is a sample reproducible code for reference.

import pandas as pd
import datetime

# Create new dataframe df
df = pd.DataFrame({'sample_date':['2018-01-10','2018-01-10','2018-01-11','2018-01-11','2018-01-12']})

# Get current date. It is 2018-01-12 for me as I write this
today = datetime.date.today()

# Add new column to the df dataframe
df['Today'] = today

# Converting all columns to datetime dtype
df['sample_date'] = pd.to_datetime(df['sample_date'])
df['Today'] = pd.to_datetime(df['Today'])

# Creating a new column to get difference of Today and sample_date column
df['Difference'] = df['Today'] - df['sample_date']

When I write the df file to system as csv or txt, I can see the output as shown below.

sample_date Today       Difference
10-01-2018  12-01-2018  2 days 00:00:00.000000000
10-01-2018  12-01-2018  2 days 00:00:00.000000000
11-01-2018  12-01-2018  1 days 00:00:00.000000000
11-01-2018  12-01-2018  1 days 00:00:00.000000000
12-01-2018  12-01-2018  0 days 00:00:00.000000000

I Want to add a new column 'Day' to the same dataframe df by splitting the 'Difference' column in such a way that only the values before days are captured. Something like the one shown below.

# Desired output
sample_date Today       Difference                  Day
10-01-2018  12-01-2018  2 days 00:00:00.000000000   2
10-01-2018  12-01-2018  2 days 00:00:00.000000000   2
11-01-2018  12-01-2018  1 days 00:00:00.000000000   1
11-01-2018  12-01-2018  1 days 00:00:00.000000000   1
12-01-2018  12-01-2018  0 days 00:00:00.000000000   0

I have tried using the str.split() option using the solution provided in this thread - How to split a column into two columns?. But I am getting an error and not able to figure out what I am doing wrong. Is there any way I can get the desired output? I am using Python 3.6.4. Any help would be appreciated.

UPDATE:

I tried the solution provided by @Jonas Byström but the output is not what I am looking for. Any Idea what I may be doing wrong?

# Trying a Solution
df['Difference'] = str(df['Today'] - df['sample_date']).split()[0]

# Output received

  sample_date      Today Difference
0  2018-01-10 2018-01-12          0
1  2018-01-10 2018-01-12          0
2  2018-01-11 2018-01-12          0
3  2018-01-11 2018-01-12          0
4  2018-01-12 2018-01-12          0
Code_Sipra
  • 1,571
  • 4
  • 19
  • 38
  • 1
    I think you need `df['Day'] = df['Difference'].dt.days`, it looks like dupe :( – jezrael Jan 12 '18 at 09:09
  • You can check second answer. – jezrael Jan 12 '18 at 09:12
  • Thanks @jesrael. +1 for your solution. I actually tried your solution too using `df['Day'] = df['Difference'].dt.days`. I got exactly the same output as I desired. I actually tried to get more information on `dt.days` from the pandas documentation [link](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.days.html). But there is not much detail provided there. It would be really great if you could tell me more about this function. – Code_Sipra Jan 12 '18 at 12:59
  • Yes, sure. First thing is for difference of 2 datetiems get [timedeltas](http://pandas.pydata.org/pandas-docs/stable/timedeltas.html) and then is the best use function `dt.days` for convert timedeltas to intergrs as days. Good luck1 – jezrael Jan 12 '18 at 13:01
  • Great, thanks again. I can see `dt.days` is returning an `int64` dtype which is awesome! I needed it to be in this dtype for further processing as is. I would really appreciate it if you could provide the same detail as an answer for rest of the world. Someone out there may be stuck with a similar problem as I did. – Code_Sipra Jan 12 '18 at 13:09
  • I really like do it, but I have problem - this question was already answered before :( So I cannot create answer. So sorry :( – jezrael Jan 12 '18 at 13:10
  • Respect, sir!! Thank you. You taught me something new in Python today. Have a good day. – Code_Sipra Jan 12 '18 at 13:17

1 Answers1

0

Edit:

>>> diff = df['Today'] - df['sample_date']
>>> first_word = lambda td: str(td).split()[0]
>>> df['Difference'] = [first_word(e) for e in diff]
>>> df
  sample_date      Today Difference
0  2018-01-10 2018-01-23         13
1  2018-01-10 2018-01-23         13
2  2018-01-11 2018-01-23         12
3  2018-01-11 2018-01-23         12
4  2018-01-12 2018-01-23         11

Or if you prefer a one-liner:

df['Difference'] = [str(e).split()[0] for e in df['Today'] - df['sample_date']]
Jonas Byström
  • 25,316
  • 23
  • 100
  • 147
  • Thanks for taking the time to help out @Jonas Byström. I tried your solution. Unfortunately, I am not getting the desired result. I have updated by question with this information too for reference if it helps as I am not sure how to provide a dataframe output in the comment section. – Code_Sipra Jan 12 '18 at 12:50