I want to get a section of a dataframe that meets a certain requirement.
I want to do:
new_df = old_df[old_df.timevariable.date() == thisdateiwant]
Is there an efficient way to do this that works?
the issue here is the .date() part. I've done this before using the same syntax but not with a modifier on the part of old_df. For example of old_df.timevariable is a datetime, then I could match this with a ==datetime
but as I want a date, I need to modify each element in the dataframe, which the syntax doesn't like.
I know I could take it all out and have it loop through with a bunch of variables, but I'm pretty sure this would be much slower. The first code snippet seemed to be the fastest way of doing this (like a WHERE SQL clause), although doesn't seem to work if you need to modify the variable you're comparing (such as .date()).
The old_df is about (900k, 15) in size so I want to get something efficient. Currently, I'm just changing variables and reimporting from SQL which seems to take 5-10 seconds for each date (thisdateiwant
). I presume something in python with the larger initial database will be quicker than this. Typically it returns about 30k rows into new_df for each date.
What is the fastest way of doing this?
Edit
Happy to mark this as a duplicate, I got it working from some code in that other question (from @Pault). basically did:
mask = old_db['timevariable'] >= thisdateiwant
mask2 = old_db['timevariable'] < thisdateiwant (+1day)
new_db = old_db.loc[mask]
new_db = new_db.loc[mask2]
I don't think there's an easy way to do both masks at the same time, seemed to throw and error. It's nice and quick so I'm happy.