3

I have a dataframe with a datetime index, and I'd like to get the number of minutes remaining until 4:00 PM (or 16:00) for each row's day, using a column calculation.

Using the answer from this post, we can create an empty dataframe with some random datetime's and assign it to the index:

def random_datetimes_or_dates(start, end, out_format='datetime', n=10): 
    (divide_by, unit) = (10**9, 's') if out_format=='datetime' else (24*60*60*10**9, 'D')
    start_u = start.value//divide_by
    end_u = end.value//divide_by
    return pd.to_datetime(np.random.randint(start_u, end_u, n), unit=unit)

start = pd.to_datetime('2019-01-01')
end = pd.to_datetime('2021-12-31')

index = random_datetimes_or_dates(start, end, out_format='datetime')
df = pd.DataFrame(index=index)

As an example, if the datetime at index n is 2021-11-29 15:30:00, then the value in the new column for that row should read 30. If it's after 16:00, its ok for the number to be negative.

What I had initially tried was this:

df['Minutes_Until_4PM'] = datetime.strptime("1600", "%H%M").time() - df.index.time()

...but this gives the error:

TypeError: 'numpy.ndarray' object is not callable

...which is fine, but I'm not even sure I'm going about this the right way, and this error might just be because of the reproducible code I've provided, but you get what I'm trying to do, WITHOUT using a for loop.

martineau
  • 119,623
  • 25
  • 170
  • 301
wildcat89
  • 1,159
  • 16
  • 47
  • What is the full error message with traceback? – mkrieger1 Dec 28 '21 at 00:58
  • `df['Minutes_Until_4PM'] = datetime.strptime("1600", "%H%M").time() - df.index.time()` `TypeError: 'numpy.ndarray' object is not callable` That is the full traceback – wildcat89 Dec 28 '21 at 01:01
  • And which of these expressions does it refer to? (check by executing them individually) – mkrieger1 Dec 28 '21 at 01:02
  • If you run the code, you'll see the error, the `df['Minutes_Until_4PM'] = datetime.strptime("1600", "%H%M").time() - df.index.time()` is the line that causes the error... – wildcat89 Dec 28 '21 at 01:03
  • I can't run the code. Do you get the error when you execute `datetime.strptime(...)` or `datetime.strptime(...).time()` or `df.index.time()`? – mkrieger1 Dec 28 '21 at 01:04
  • Oh I see, it'll be the `df.index.time()` that causes it because `df.index` is technically an array, so in order to get one specific time from the index, you'd have to use `df.index[n]`, but that would mean iterating through the df, and I want to use a column calculation instead. – wildcat89 Dec 28 '21 at 01:05
  • Since as you said you *don't* want to get one specific time and write a loop, why don't you try to subtract the entire array instead of trying to call it like a function (i.e. omit the `()`)? – mkrieger1 Dec 28 '21 at 01:07
  • `TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'` – wildcat89 Dec 28 '21 at 01:09
  • What should happen for datetimes after 16:00 on that day? Return a negative value, or count the number of minutes until the next 16:00? – Daniel P Dec 28 '21 at 01:39
  • @DanielP I updated the question to say that it's ok if the number is a negative to make things easier. In my code, I just run a check for negative numbers, so that's fine. i.e. if the datetime is `2021-11-01 16:05:00` the value would be `-5`, that's fine – wildcat89 Dec 28 '21 at 01:43
  • ok, and why no use of a FOR loop? would an apply on a Series be fine, or does it has to be vectorized? – Daniel P Dec 28 '21 at 02:05
  • My dataset is several hundred thousand rows, so the for loop is not really optimal here, and I'd like to use the logic for the column calculation that gets discovered here in other projects. If you have a solution, feel free to post it, otherwise all the info needed should be provided above – wildcat89 Dec 28 '21 at 02:08

1 Answers1

2

One option would be to get your datetimes into timestamps (unit in seconds), take a modulo of the number of seconds in a day (to only keep the number of seconds since midnight), subtract that from the number of seconds between midnight and 4pm, and then divide by 60 to get the number of minutes:

df['Minutes_Until_4PM'] = ((16 * 60 * 60) - df.timestamp.mod(24 * 60 * 60)) // 60

Note that the // will floor the division, which might not be the type of behaviour you're looking for...

Daniel P
  • 140
  • 4
  • Gives the error `AttributeError: 'DataFrame' object has no attribute 'timestamp'` , which I thought was because it should have read df.index.timestamp, but adding that in gives the error `AttributeError: 'DatetimeIndex' object has no attribute 'timestamp'`... – wildcat89 Dec 28 '21 at 02:32
  • 1
    Yes, my proposition started with "get your datetimes into timestamps" which I had left to you to decide how to do. Personally, I created a two new columns in the dataframe, in particular because indexes are much less versatile than normal columns/Series. `df['dt']=df.index` and `df['timestamp']=df.dt.apply(datetime.timestamp).astype(int)`. Alternatively, if you populate your index directly with timestamps, you could use `df.index.__mod__()` to take the modulo. – Daniel P Dec 28 '21 at 02:46