0

I have a dataframe which has a timestamp column in seconds since epoch format. It has the dtype float.

It want to filter the dataframe by a specific time window.

Approach:

zombieData[(zombieData['record-ts'] > period_one_start) & (zombieData['record-ts'] < period_one_end)]

This returns an empty dataframe. I can confirm that I have timestamp bigger, smaller and in my timeframe. I calculate my timestamps with the following method:

period_one_start = datetime.strptime('2020-12-06 03:30:00', '%Y-%m-%d %H:%M:%S').timestamp()

I'm glad for any help. I guess my filtering logic is wrong which confuses me, as one condition filtering (e.g. everything after start time) is working.

Thx for your help!

HmmRfa
  • 34
  • 5
  • 1
    Don't use `datetime` package. Use pandas' [`datetime` type](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html). – Quang Hoang Dec 22 '20 at 18:27
  • Possible duplicate [from here](https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates). You can use pandas [`between`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.between.html): `zombieData[zombieData['record-ts'].between('2020-12-01', '2020-12-04')]`. – Cainã Max Couto-Silva Dec 22 '20 at 21:29

2 Answers2

1

This looks messy but I highly recommend. Converting to pd.Timestamp before will be most robust for ensuring good comparison and calling to numpy methods for less than and greater than will compute a little bit quicker in a majority of situations (especially for larger dataframes).

zombieData[zombieData['record-ts'].gt(pd.Timestamp('2020-12-06')) &
           zombieData['record-ts'].lt(pd.Timestamp('2020-12-09'))]

New Option: I learned of the between method. I think this is easier to read.

zombieData[zombieData['record-ts'].between(left=pd.Timestamp('2020-12-06'),
                                           right=pd.Timestamp('2020-12-09'),
                                           inclusive="neither")]
    
ak_slick
  • 1,006
  • 6
  • 19
0
import pandas as pd
from datetime import datetime
import numpy as np
date = np.array('2020-12-01', dtype=np.datetime64)
dates = date + np.arange(12)

period_one_start = datetime.strptime('2020-12-06 03:30:00', '%Y-%m-%d %H:%M:%S').timestamp()
period_one_end   = datetime.strptime('2020-12-09 03:30:00', '%Y-%m-%d %H:%M:%S').timestamp()
zombieData = pd.DataFrame( data= {"record-ts": dates} )
zombieData[ ((zombieData['record-ts'] > '2020-12-06') & (zombieData['record-ts'] < '2020-12-09')) ]

(if you want to keep you format)

InLaw
  • 2,537
  • 2
  • 21
  • 33