0

How to create a new dataframe based on date interval from an existing dataframe:

df=pd.DataFrame([["01.01.18",32],
    ["01.01.18",8],
    ["01.01.18",12],
    ["01.01.18",77],
    ["02.01.18",99],
    ["03.01.18",78],
    ["04.01.18",89],
    ["02.02.18",85],
    ["10.03.18",35],
    ["13.04.18",81],
    ["03.02.18",97],
    ["29.03.18",90],
    ["08.04.18",7]],columns=["date","payment"])

How do I create a dataframe with date values between 01.01.18 and 31.01.18, so the new df would look like:

Date Payment 
01.01.18,   32
01.01.18,   8
01.01.18,   12
01.01.18,   77
02.01.18,   99
03.01.18,   78
04.01.18,   89
user40
  • 1,361
  • 5
  • 19
  • 34
  • What would you want as `payment` for days which are not in your existing dataframe? It might help if you can show a sample of your expected output. – jpp May 30 '18 at 17:03
  • @jpp updated the question – user40 May 30 '18 at 17:07
  • `df[(df.date >= "01.01.18") & (df.date <= "31.01.18")]`? – rafaelc May 30 '18 at 17:08
  • You can look at the marked duplicate. Note that `.ix` has been depreciated, so use `.loc`. – jpp May 30 '18 at 17:09
  • @jpp well, every single answer in that question has .ix solution (quite confusing since it was depreciated), may be it worth updating the question or create another one, like this. – user40 May 30 '18 at 17:15
  • @user40, When I have some time, I'll go through them and stick a banner on top :). But for now, I always make sure when I mark as I duplicate I add a comment so users don't get confused. I've also added another recent duplicate. – jpp May 30 '18 at 17:16

1 Answers1

0

You need:

df.set_index(pd.to_datetime(df['date'])).loc['2018-01-01':'2018-01-31'].reset_index(drop=True)

Output:

        date    payment
0   01.01.18    32
1   01.01.18    8
2   01.01.18    12
3   01.01.18    77

You can keep using your original dateformat as well with below code and achive same output:

import datetime as dt
start = dt.datetime.strptime('01.01.18', '%d.%m.%y').strftime('%Y-%m-%d')
end = dt.datetime.strptime('31.01.18', '%d.%m.%y').strftime('%Y-%m-%d')
df.set_index(pd.to_datetime(df['date'])).loc[start:end].reset_index(drop=True)
harvpan
  • 8,571
  • 2
  • 18
  • 36