-2

I have a numpy array which I wish to filter by datetime. I have current functionality to compare an input datetime (start and end) to the dataframe like so:

    if trim:
        columns = input_hdf.columns.get_level_values(0)
        print(str(columns))
        print(start)
        print(end)
        if start is not None and end is not None:
            mask = (columns >= start) & (columns <= end)
        elif start is not None:
            mask = (columns >= start)
        elif end is not None:
            mask = (columns <= end)
        else:
            # Should never reach this point, but just in case - mask will not affect the data
            mask = True
        input_hdf = input_hdf.loc[:, mask]

However, I'd like to add functionality for start and end to be specified as a "day of year", where the year is irrelevant to the comparison - if the day is later than 1st October then exclude it, be it 2001 or 2021.

I am currently converting an integer value into datetime via:

start = datetime.strptime(start, '%d-%m-%Y') if start else None

Which gives a default year of 1900, which will become part of the comparison.

Harry Adams
  • 423
  • 5
  • 16
  • 2
    https://stackoverflow.com/help/minimal-reproducible-example – Woody Pride Aug 15 '19 at 14:11
  • Numpy arrays usually contain a single type across all rows and columns (unless your array is entirely datetime) while Pandas data frames contain different types across columns. Unclear what data you are originally working with. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Parfait Aug 15 '19 at 14:26
  • You say Numpy array, but it looks like you're using pandas? I'm not a pandas expert but it has a ton of built-in functionality for time series manupulation and date comparison. – Iguananaut Aug 15 '19 at 14:26

1 Answers1

0

pandas has much better support for date & time. This answer take advantages of the fact that datetime-strings in the form mm-dd are sortable:

dates = <ndarray of dates>
s = pd.Series(dates, index=dates).dt.strftime('%m-%d')

# Select between Oct 1 and Dec 31 of all years
cond = ('10-01' <= s) & (s <= '12-31')
selected = s[cond].index.values
Code Different
  • 90,614
  • 16
  • 144
  • 163