0

I have the a pandas dataframe in this format:

Dates
11-Feb-18
18-Feb-18
03-Mar-18
25-Mar-18
29-Mar-18
04-Apr-18
08-Apr-18
14-Apr-18
17-Apr-18
30-Apr-18
04-May-18

I want to find dates between two consecutive dates. In this example I want to make a new column which will contain dates between two consecutive dates. For example between 11-Feb-18 and 18-Feb-18, I will get all the dates between these two dates.

I tried this code but it's throwing me error:

pd.DataFrame({'dates':pd.date_range(pd.to_datetime(df_new['Time.[Day]'].loc[i].diff(-1)))})
Anjali
  • 1
  • 1
    could you make a mockup table of your desired output? I'm not sure whether you want the "dates" column to become expanded by all the missing dates, or whether you want a second column that holds all missing dates as a list. – Racooneer Mar 26 '21 at 18:40
  • In any case, this link might help https://stackoverflow.com/questions/7274267/print-all-day-dates-between-two-dates While this is a duplicate, look for the list comprehension in one the secon answer. Imho, its the most elegant way to create a date range – Racooneer Mar 26 '21 at 18:51
  • @Racooner I want to create a new column in the dataframe which will contain all the dates between two consecutive dates – Anjali Mar 27 '21 at 18:05
  • Anjali, have you tried my code? Please, be kind to comment whats wrong. It was tested. The result is a dataframe with dates you want – Alexey Mar 30 '21 at 16:03

2 Answers2

0

This works with dataframes, adding a new column containing the requested list It iterates over the column 1, preparing a list of lists for column 2.

At the and it creates a new dataframe column and assigns the prepared values to it.

import pandas as pd
from pprint import pp
from datetime import datetime, timedelta

df = pd.read_csv("test.csv")
in_betweens = []

for i in range(len(df["dates"])-1):
    d = datetime.strptime(df["dates"][i],"%d-%b-%y")
    d2 = datetime.strptime(df["dates"][i+1],"%d-%b-%y")
        
    d = d + timedelta(days=1)
    in_between = []
    while d < d2:
        in_between.append(d.strftime("%d-%b-%y"))
        d = d + timedelta(days=1)


    in_betweens.append(in_between)
    
in_betweens.append([])
df["in_betwens"] = in_betweens

df.head()
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Mirronelli
  • 740
  • 5
  • 14
0

if you want to add a column with the list of dates tat are missing in between, this shoudl work. This could be more efficient and it has to work around the NaT in the last row and becomes a bit longer as intended, but gives you the result.

import pandas as pd
from datetime import timedelta

test_df = pd.DataFrame({
    "Dates" : 
        ["11-Feb-18", "18-Feb-18", "03-Mar-18", "25-Mar-18", "29-Mar-18", "04-Apr-18",
        "08-Apr-18", "14-Apr-18", "17-Apr-18", "30-Apr-18", "04-May-18"]
    })

res = (
    test_df
        .assign(
            # convert to datetime
            Dates = lambda x : pd.to_datetime(x.Dates),
            # get next rows date
            Dates_next = lambda x : x.Dates.shift(-1),
            # create the date range
            Dates_list = lambda x : x.apply(
                lambda x : 
                    pd.date_range(
                        x.Dates + timedelta(days=1), 
                        x.Dates_next - timedelta(days=1),
                        freq="D").date.tolist()
                    if pd.notnull(x.Dates_next)
                    else None
                , axis = 1
            ))
)
print(res)

results in:

        Dates Dates_next                                         Dates_list
0  2018-02-11 2018-02-18  [2018-02-12, 2018-02-13, 2018-02-14, 2018-02-1...
1  2018-02-18 2018-03-03  [2018-02-19, 2018-02-20, 2018-02-21, 2018-02-2...
2  2018-03-03 2018-03-25  [2018-03-04, 2018-03-05, 2018-03-06, 2018-03-0...
3  2018-03-25 2018-03-29               [2018-03-26, 2018-03-27, 2018-03-28]
4  2018-03-29 2018-04-04  [2018-03-30, 2018-03-31, 2018-04-01, 2018-04-0...
5  2018-04-04 2018-04-08               [2018-04-05, 2018-04-06, 2018-04-07]
6  2018-04-08 2018-04-14  [2018-04-09, 2018-04-10, 2018-04-11, 2018-04-1...
7  2018-04-14 2018-04-17                           [2018-04-15, 2018-04-16]
8  2018-04-17 2018-04-30  [2018-04-18, 2018-04-19, 2018-04-20, 2018-04-2...
9  2018-04-30 2018-05-04               [2018-05-01, 2018-05-02, 2018-05-03]
10 2018-05-04        NaT                                               None

As a sidenote, if you don't need the last row after the analysis, you could filter out the last row after assigning the next date and eliminate the if statement to make it faster.

Racooneer
  • 329
  • 1
  • 2
  • 11