I have to two dataframes. The first one has the index ranging from 3rd of December 2006 to 20th of December 2006 for example. The second one has the dates ranging from 2000 to 2020. I want to drop the rows from the second one when their dates are not within the first one's interval.
Consider the example below:
The first one is this one:
index value
'2006-12-03 13:06:21.955000' 3
'2006-12-03 13:14:54.100000' 4
'2006-12-04 13:23:25.929000' 5
'2006-12-05 13:31:58.074000' 6
'2006-12-05 13:40:29.903000' 7
'2006-12-05 13:49:02.048000' 8
'2006-12-06 13:57:33.877000' 9
.
.
.
'2006-12-20 14:06:06.022000' 100
'2006-12-20 14:14:37.851000' 110
The second one is this one:
id date name
.
.
.
39 2005-08-22 17:27:00 O
40 2005-09-07 17:40:00 F
41 2006-12-05 10:35:00 X
42 2006-12-13 02:40:00 F
43 2010-08-14 10:05:00 F
44 2011-03-07 20:12:00 M
45 2011-06-07 08:03:00 U
46 2011-08-04 04:12:00 M
47 2011-08-09 08:05:00 P
48 2011-09-22 11:01:00 L
49 2011-11-26 07:10:00 N
50 2012-01-23 03:59:00 M
51 2012-01-27 18:37:00 X
.
.
.
The desired one is the second one but edited as follows:
41 2006-12-05 10:35:00 X
42 2006-12-13 02:40:00 F
--> to remain only with the dates that are also present in the first one.
I tried the following command in order to drop the rows based on the condition:
second_df = second_df[(second_df.date < date_start_first) | (second_df.date > date_end_first)]
(I inspired from this answer.)
Unfortunately, the above line of code does not work ...
The date_start_first
and date_end_first
are extracted as follows:
date_start_first = getStartEndDatesOfDataframe(first_df, "start")
date_end_first = getStartEndDatesOfDataframe(first_df, "end")
Using this function:
def getStartEndDatesOfDataframe(dataSeriesName, start_or_end):
if (start_or_end == "start"):
date = dataSeriesName.index[0]
else:
date = dataSeriesName.index[len(dataSeriesName.index)-1]
return date
Could you help me please with this problem?
P.S.: both "dates" have the same type, I verifies using type() function:
print(type(second_df.date[3]), type(first_df.index[3]))
And it gives:
<class 'pandas._libs.tslibs.timestamps.Timestamp'> <class 'pandas._libs.tslibs.timestamps.Timestamp'>