1

I've been trying various codes for a while but I can't get what I need. The initial data frame is:

Number   Date 1          Date 2
1        2020/04/20      2021/03/04
2        2020/04/20      2021/03/04
3        2020/04/20      2021/03/04
1        2020/05/26      2021/08/04
2        2020/05/26      2021/08/04
3        2020/05/26      2021/08/04

When the number column is equal to 1, I want to mix the dates: take the year from date 1 and the month and day from date 2, and the rest of the empty rows.

What i would like to have:

Number   Date 1          Date 2        Date 3
1        2020/04/20      2021/03/04     2020/03/04
2        2020/04/20      2021/03/04     0
3        2020/04/20      2021/03/04     0
1        2020/05/26      2021/08/04     2020/08/04
2        2020/05/26      2021/08/04     0
3        2020/05/26      2021/08/04     0

and in the empty spaces add one year to the previous date, thus:

Number   Date 1          Date 2        Date 3
1        2020/04/20      2021/03/04     2020/03/04
2        2020/04/20      2021/03/04     2021/03/04
3        2020/04/20      2021/03/04     2022/03/04
1        2020/05/26      2021/08/04     2020/08/04
2        2020/05/26      2021/08/04     2021/08/04
3        2020/05/26      2021/08/04     2022/08/04

I have tried the following:

df['Date 3'] = (df['Number'] ==1, (df['Date 1'].dt.year) + (df['Date 2'].dt.month) + (df['Date '].dt.day), 0)

but add the days and that's what I don't want

llamaking136
  • 421
  • 5
  • 16
Katherine
  • 41
  • 4

3 Answers3

1

The original question:

"When the number column is equal to 1, I want to mix the dates: take the year from date 1 and the month and day from date 2, and the rest of the empty rows."

"and in the empty spaces add one year to the previous date"

Just do that:

from datetime import date

df['Date 3'] = [
    date(
        pd.to_datetime(df['Date 1'][i]).year + df['Number'][i] - 1, 
        pd.to_datetime(df['Date 2'][i]).month, 
        pd.to_datetime(df['Date 2'][i]).day
    ).strftime('%Y/%m/%d')
    for i, _ in df.iterrows()
]

But the author later posted in comment session:

"I am trying a modification. When the Number column is 1, add a year to it. When it is 2, add 6 months to it. When it is 3, add 3 months to it."

In this case, do:

from datetime import date
from dateutil.relativedelta import relativedelta

df['Date 4'] = [
    (date(
        pd.to_datetime(df['Date 1'][i]).year, 
        pd.to_datetime(df['Date 2'][i]).month, 
        pd.to_datetime(df['Date 2'][i]).day
    ) + relativedelta(months = 12 / (2**(df['Number'][i]-1)))
    ).strftime('%Y/%m/%d')
    for i,_ in df.iterrows()]

Notes:

  • It is not clear to me if we should add time interval in the year Date 1 or Date 2, I added it to Date 1...
  • months = 12 / (2**(df['Number'][i]-1)) give us the time interval: 12, 6, 3 months.

Example:

import io
dftxt= """
Number   Date 1          Date 2
1        2020/04/20      2021/03/04
2        2020/04/20      2021/03/04
3        2020/04/20      2021/03/04
1        2020/05/26      2021/08/04
2        2020/05/26      2021/08/04
3        2020/05/26      2021/08/04
"""
df = pd.read_fwf(io.StringIO(dftxt), 'infer', header=1)

df:

    Number  Date 1      Date 2
0   1       2020/04/20  2021/03/04
1   2       2020/04/20  2021/03/04
2   3       2020/04/20  2021/03/04
3   1       2020/05/26  2021/08/04
4   2       2020/05/26  2021/08/04
5   3       2020/05/26  2021/08/04

Output 1:

    Number  Date 1      Date 2      Date 3
0   1       2020/04/20  2021/03/04  2020/03/04
1   2       2020/04/20  2021/03/04  2021/03/04
2   3       2020/04/20  2021/03/04  2022/03/04
3   1       2020/05/26  2021/08/04  2020/08/04
4   2       2020/05/26  2021/08/04  2021/08/04
5   3       2020/05/26  2021/08/04  2022/08/04

Output 2:

    Number  Date 1      Date 2      Date 3      Date 4
0   1       2020/04/20  2021/03/04  2020/03/04  2021/03/04
1   2       2020/04/20  2021/03/04  2021/03/04  2020/09/04
2   3       2020/04/20  2021/03/04  2022/03/04  2020/06/04
3   1       2020/05/26  2021/08/04  2020/08/04  2021/08/04
4   2       2020/05/26  2021/08/04  2021/08/04  2021/02/04
5   3       2020/05/26  2021/08/04  2022/08/04  2020/11/04
Naivre
  • 97
  • 7
  • I am trying a modification. When the Number column is 1, add a year to it. When it is 2, add 6 months to it. When it is 3, add 3 months to it. But it doesn't add me – Katherine Feb 01 '21 at 22:58
  • Your original question don't specify that. You should change your question. Or create a new question. At this time, I think my response match your question. – Naivre Feb 01 '21 at 23:12
  • 1
    Yes, @Katherine, I think it is better to edit your question restating what you want to solve. – Fabio Mendes Soares Feb 01 '21 at 23:23
0

The way to go is to use a lambda function inside the method apply, where you can check for the Number column if it equals 1 then write the value you want, or null otherwise

df['Date 3']=df.apply(lambda x:datetime(x['Date 1'].year,x['Date 2'].month,x['Date 2'].day) \
   if x['Number']==1 else np.datetime64('NaT'),axis=1)

To fill the null values with the previous dates, you can use the method fillna with ffill method

df['Date 3']=df['Date 3'].fillna(method='ffill')

To add an extra year to the repeated values, you can actually use almost the same lambda function:

df['Date 3']=df.apply(lambda x:datetime(x['Date 1'].year+x['Number']-1,x['Date 2'].month,x['Date 2'].day),axis=1)

Check the full code how I did:

from datetime import datetime
import pandas as pd
import numpy as np

## I'll leave the database creation for those who want to debug
df=pd.DataFrame([[1,datetime(2020,4,20),datetime(2021,3,4)],
                 [2,datetime(2020,4,20),datetime(2021,3,4)],
                 [3,datetime(2020,4,20),datetime(2021,3,4)],
                 [1,datetime(2020,5,26),datetime(2021,8,4)],
                 [2,datetime(2020,5,26),datetime(2021,8,4)],
                 [3,datetime(2020,5,26),datetime(2021,8,4)]],
                 columns=['Number','Date 1','Date 2'])

df['Date 3']=df.apply(lambda x:datetime(x['Date 1'].year,x['Date 2'].month,x['Date 2'].day) \
              if x['Number']==1 else np.datetime64('NaT'),axis=1)

df['Date 3']=df['Date 3'].fillna(method='ffill')

df['Date 3']=df.apply(lambda x:datetime(x['Date 1'].year+x['Number']-1,x['Date 2'].month,x['Date 2'].day),axis=1)

The resulting dataframe is

>>> df
   Number     Date 1     Date 2     Date 3
0       1 2020-04-20 2021-03-04 2020-03-04
1       2 2020-04-20 2021-03-04 2021-03-04
2       3 2020-04-20 2021-03-04 2022-03-04
3       1 2020-05-26 2021-08-04 2020-08-04
4       2 2020-05-26 2021-08-04 2021-08-04
5       3 2020-05-26 2021-08-04 2022-08-04
Fabio Mendes Soares
  • 1,357
  • 5
  • 20
  • 30
  • I am trying a modification. When the Number column is 1, add a year to it. When it is 2, add 6 months to it. When it is 3, add 3 months to it. But it doesn't add me – Katherine Feb 01 '21 at 21:02
  • You can still determine that in a lambda function, but the thing I also thought it was easier is to add years, months or days to a date. I came across this thread https://stackoverflow.com/questions/546321/how-do-i-calculate-the-date-six-months-from-the-current-date-using-the-datetime . Using the relativedelta function you can just add it to: lambda x: x['Date 3']+relativedelta(months=6) if x['Number']==2 else x['Date 3']+relativedetla(months=3) if x['Number']==3. Sorry I don't know how to format a comment, but I think you should get the idea. – Fabio Mendes Soares Feb 01 '21 at 23:22
0

Try to use date to make a new date from date1 and date2.

from datetime import date
df['Date 3'] = [
  date(df['Date 1'][i].year, df['Date 2'][i].month, df['Date 2'][i].day) 
  if (df['Number'][i] == 1) 
  else
  0
  for i in range(len(df))
]

print(df)

output:

   Number      Date 1      Date 2      Date 3
      1    2020-04-20  2021-03-04  2020-03-04
      2    2020-04-20  2021-03-04           0
      3    2020-04-20  2021-03-04           0
      1    2020-05-26  2021-08-04  2020-08-04
      2    2020-05-26  2021-08-04           0
      3    2020-05-26  2021-08-04           0