1

I’m trying to look at some sales data for a small store. I have a time stamp of when the settlement was made, but sometimes it’s done before midnight and sometimes its done after midnight.

This is giving me data correct for some days and incorrect for others, as anything after midnight should be for the day before. I couldn’t find the correct pandas documentation for what I’m looking for.

Is there an if else solution to create a new column, loop through the NEW_TIMESTAMP column and set a custom timeframe (if after midnight, but before 3pm: set the day before ; else set the day). Every time I write something it either runs forever, or it crashes jupyter.

Data: Data looks like this:

Joshtralia
  • 121
  • 4

1 Answers1

1

What I did is I created another series which says when a day should be offset back by one day, and I multiplied it by a pd.timedelta object, such that 0 turns into "0 days" and 1 turns into "1 day". Subtracting two series gives the right result. Let me know how the following code works for you.

import pandas as pd
import numpy as np


# copied from https://stackoverflow.com/questions/50559078/generating-random-dates-within-a-given-range-in-pandas
def random_dates(start, end, n=15):
    start_u = start.value//10**9
    end_u = end.value//10**9
    return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')

dates = random_dates(start=pd.to_datetime('2020-01-01'),
                     end=pd.to_datetime('2021-01-01'))
timestamps = pd.Series(dates)

# this takes only the hour component of every datetime
hours = timestamps.dt.hour

# this takes only the hour component of every datetime
dates = timestamps.dt.date

# this compares the hours with 15, and returns a boolean if it is smaller
flag_is_day_before = hours < 15

# now you can set the dates by multiplying the 1s and 0s with a day timedelta
new_dates = dates - pd.to_timedelta(1, unit='day') * flag_is_day_before

df = pd.DataFrame(data=dict(timestamps=timestamps, new_dates=new_dates))

print(df)

This outputs

            timestamps   new_dates
0  2020-07-10 20:11:13  2020-07-10
1  2020-05-04 01:20:07  2020-05-03
2  2020-03-30 09:17:36  2020-03-29
3  2020-06-01 16:16:58  2020-06-01
4  2020-09-22 04:53:33  2020-09-21
5  2020-08-02 20:07:26  2020-08-02
6  2020-03-22 14:06:53  2020-03-21
7  2020-03-14 14:21:12  2020-03-13
8  2020-07-16 20:50:22  2020-07-16
9  2020-09-26 13:26:55  2020-09-25
10 2020-11-08 17:27:22  2020-11-08
11 2020-11-01 13:32:46  2020-10-31
12 2020-03-12 12:26:21  2020-03-11
13 2020-12-28 08:04:29  2020-12-27
14 2020-04-06 02:46:59  2020-04-05
yonatansc97
  • 584
  • 6
  • 16
  • When I try and run your code, I get “ TypeError: ufunc subtract cannot use operands with types dtype(' – Joshtralia Nov 09 '20 at 08:00
  • What version of pandas do you have? The above worked on pandas 1.1.3. Try to cast one of the elements to timedelta types. – yonatansc97 Nov 09 '20 at 08:25