0

I have a dataframe p_md. The index is a DateTime. I want to make a new column called Finish where if the index is before 5PM of that day, the column value is 11PM of that day. Otherwise if it is after 5PM, the Finish column value is 11PM of the NEXT day.

What I have so far:

p_md["Finish"] = pd.Timestamp(datetime(p_md.index.year, p_md.index.month, p_md.index.day, 23, 0, 0))

p_md.loc[(p_md.index.hour > 17), "Finish"] = p_md.Finish + pd.Timedelta(days=1)

When I do this I get a TypeError stating that the datetime constructor is getting a int64Index instead of an int. So I changed the line to

p_md["Finish"] = pd.Timestamp(datetime(p_md.index.year[0], p_md.index.month[0], p_md.index.day[0], 23, 0, 0))

This compiles and runs, however it only uses the first row of the dataframe's values, presumably due to the [0].

Table Creation code request: I just read the DateTime from a csv file, but here is basically what the initial table looks like:

df = pd.DataFrame()

df['DateTime'] = pd.date_range("1/1/2017", periods=500, freq="H")
df.set_index("DateTime", inplace=True)
df["Test"] = 0
john
  • 169
  • 12

1 Answers1

0

If I understand you correctly, I would create the "Finish" as:

p_md["Finish"] = p_md.index

Then, I would use the series apply (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html) function to get the desired output:

p_md["Finish"] = p_md["Finish"].apply(lambda dt: pd.Timestamp(dt.year, dt.month, dt.day, 23, 0 ,0) + pd.Timedelta(days=1) 
                                                 if dt.hour > 17 else pd.Timestamp(dt.year, dt.month, dt.day, 23, 0 ,0))

Or you can create a separate function and assign it to the apply:

def cvt_date(dt):
    new_dt = pd.Timestamp(dt.year, dt.month, dt.day, 23, 0 ,0)
    if dt.hour > 17:  new_dt = new_dt + pd.Timedelta(days=1)
    return new_dt

p_md['Finish'] = p_md['Finish'].apply(cvt_date)

here's the original data and outputs I get:

2016-03-04 03:48:41 - 2016-03-04 23:00:00
2016-02-05 22:08:25 - 2016-02-06 23:00:00
2016-12-11 19:13:54 - 2016-12-12 23:00:00

Vico
  • 579
  • 3
  • 13
  • perfect, thanks! Could you explain the lambda dt part a bit more? I'm new to python and the lambda syntax is still confusing to me. I'm not sure how python knows that dt is referring to the datetime object in ["Finish"] and how to treat it as a datetime object – john Aug 20 '17 at 17:52
  • Lambdas are one way of defining functions. There's a good discussion on lambda here: https://stackoverflow.com/questions/890128/why-are-python-lambdas-useful Python knows that it is a datetime object because I am using the .apply function on the datetime series "Finish". Apply basically iterates over the series and I defined each iteration value as dt. you should get similar results with something like this: [cvt_date(value) for value in p_mod['finish']] – Vico Aug 20 '17 at 19:23