5

I have DataFrame in which there is a column with event dates (dates are not unique). I need to select all the data that is in this period. I try next:

start_day = datetime.date(datetime.strptime(start_day, '%d.%m.%Y')) #change user data to date format

end_day = datetime.date(datetime.strptime(end_day, '%d.%m.%Y'))

df = df[df['Transaction_date'].between(start_day, end_day)]

The type of column Transaction_date is datetime64[ns]. When I run the code with the request for the period 01/01/2020 to 31/01/2020 - part of the data for the specified period is lost (information is displayed only for 21 days, it should be for 31 days).

I see that the Data Frame is contained data for the entire requested period. Can you help me please, where is my mistake?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Maksim Kostinov
  • 61
  • 1
  • 1
  • 4
  • Can you show the output for `print(df[~df['Transaction_date'].between(start_day, end_day)].shape)` (add after the your code ) – DOOM Feb 27 '20 at 21:24
  • 1
    Does this answer your question? [Filtering Pandas DataFrames on dates](https://stackoverflow.com/questions/22898824/filtering-pandas-dataframes-on-dates) – Raul de Queiroz Feb 27 '20 at 21:49
  • After this: print(df[~df['Transaction_date'].between(start_day, end_day)].shape) I get next output: (0,12) – Maksim Kostinov Feb 28 '20 at 14:37

1 Answers1

4

I used the following test DataFrame:

   Transaction_date  Value
0        2019-12-31     10
1        2020-01-10     10
2        2020-01-15     10
3        2020-01-20     10
4        2020-01-25     10
5        2020-01-28     10
6        2020-01-29     10
7        2020-01-30     10
8        2020-01-31     10
9        2020-02-01     10
10       2020-02-01     10

Start / end dates are:

start_day = '01.01.2020'
end_day = '31.01.2020'

The code is:

# Convert start / end dates to datetime
start_day = pd.to_datetime(start_day)
end_day = pd.to_datetime(end_day)

I noticed that you use datetime module, which I think is a bad practice. To this end use dedicated pandasonic function (to_datetime). It is even clever enough to recognize many of commonly used date formats, so there is no need to specify it on your own.

To get the actual result, run:

df[df['Transaction_date'].between(start_day, end_day)]

The result is:

  Transaction_date  Value
1       2020-01-10     10
2       2020-01-15     10
3       2020-01-20     10
4       2020-01-25     10
5       2020-01-28     10
6       2020-01-29     10
7       2020-01-30     10
8       2020-01-31     10

So as you can see:

  • The date from 2019 has been eliminated.
  • The output contains only January dates, to the very end.
  • Dates from the beginning of February have also been eliminated.

Repeat my example on your computer. Maybe the source of your problem (that the result dous not contain all dates from January) is somewhere else?

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • Thank you for answer, realy help me. With your support I found an error in source data - a different date format in the column 'Transaction_date'. I think that this is exactly my problem. – Maksim Kostinov Feb 28 '20 at 14:28