I have a few different data frames with a similar data structure, but the contents of those data frames are slightly different. Specifically, the datetime format of the datetime fields is different- in some cases, the timestamps are timezone aware, in other cases, they are not. I need to find the minimum range of timestamps that overlap all three dataframes, such that the data in the final dataframes exclusively overlaps the same time periods.
The approach I wanted to take was to take the minimum start time from each of the starttime timestamps in each dataframe, and then take the max of those, and then repeat (but invert) the process for the endtimes. However, when I do this I get an error indicating I cannot compare timestamps with different timezone awareness. I've taken a few different approaches- using tz_convert on the timestamp series, as below:
model_output_dataframes['workRequestSplitEndTime']= pd.to_datetime(model_output_dataframes['workRequestSplitEndTime'], infer_datetime_format=True).tz_convert(None)
this generates the error
TypeError: index is not a valid DatetimeIndex or PeriodIndex
So I tried converting it into a datetimeindex, and then converting it:
model_output_dataframes['workRequestSplitEndTime']= pd.DatetimeIndex(pd.to_datetime(model_output_dataframes['workRequestSplitEndTime'], infer_datetime_format=True)).tz_convert(None)
and this generates a separate error:
ValueError: cannot reindex from a duplicate axis
So at this point, I'm somewhat stuck - I feel like after my conversions I'm back at the place I started.
I would appreciate any help you can give me.