0

I would like to select a subset of rows from my Data Frame within a specified range of timestamps.

This below is what I have done but it does not seem to be working. I am trying to select the subset of rows that take place between 01/01/2019 01:00 and 01/01/2021 01:00.

Thank you!

df.loc[(df['TIMESTAMP'] == "01/01/2019 01:00"): (df['TIMESTAMP'] == "01/01/2021 01:00")]
student214
  • 95
  • 9

1 Answers1

1

I created a test DataFrame as:

          TIMESTAMP  Amount
0  01/01/2019 00:00   101.0
1  01/01/2019 01:00   102.0
2  01/01/2019 02:00   103.0
3  01/05/2019 01:00   104.0
4  02/02/2020 00:00   105.0
5  01/01/2021 00:00   106.0
6  01/01/2021 01:00   107.0
7  01/01/2021 02:00   108.0
8  01/02/2021 01:00   109.0

Note that TIMESTAMP column is of object type (actually it holds strings). As you used "/" as separators, I assume US date formatting, i.e. mm/dd/yyyy.

The first step is to convert TIMESTAMP column to datetime type:

df.TIMESTAMP = pd.to_datetime(df.TIMESTAMP)

If you print df now, it will contain:

            TIMESTAMP  Amount
0 2019-01-01 00:00:00   101.0
1 2019-01-01 01:00:00   102.0
2 2019-01-01 02:00:00   103.0
3 2019-01-05 01:00:00   104.0
4 2020-02-02 00:00:00   105.0
5 2021-01-01 00:00:00   106.0
6 2021-01-01 01:00:00   107.0
7 2021-01-01 02:00:00   108.0
8 2021-01-02 01:00:00   109.0

(this is the way how Pandas formats timestamps).

And now, to get rows within your expected datetime range, run:

result = df.query('TIMESTAMP.between("2019-01-01 01:00", "2021-01-01 01:00")')

Note that although TIMESTAMP column is now of datetime type, you can specify datetimes as strings.

The result is:

            TIMESTAMP  Amount
1 2019-01-01 01:00:00   102.0
2 2019-01-01 02:00:00   103.0
3 2019-01-05 01:00:00   104.0
4 2020-02-02 00:00:00   105.0
5 2021-01-01 00:00:00   106.0
6 2021-01-01 01:00:00   107.0
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41