3

I have an set of stock information, with datetime set as index, stock market only open on weekdays so all my rows are weekdays, which is fine, I would like to determine if a row is start of the week or end of week, which might NOT always fall on Monday/Friday due to holidays. A better idea is to determine if there is an row entry on the next/previous day in the dataframe ( since my data is guaranteed to only exist for workday), but I dont know how to calculate this. Here is an example of my data:

date    day_of_week day_of_month    day_of_year month_of_year
5/1/2017    0   1   121 5
5/2/2017    1   2   122 5
5/3/2017    2   3   123 5
5/4/2017    3   4   124 5
5/8/2017    0   8   128 5
5/9/2017    1   9   129 5
5/10/2017   2   10  130 5
5/11/2017   3   11  131 5
5/12/2017   4   12  132 5
5/15/2017   0   15  135 5
5/16/2017   1   16  136 5
5/17/2017   2   17  137 5
5/18/2017   3   18  138 5
5/19/2017   4   19  139 5
5/23/2017   1   23  143 5
5/24/2017   2   24  144 5
5/25/2017   3   25  145 5
5/26/2017   4   26  146 5
5/30/2017   1   30  150 5

Here is my current code

# Date fields
def DateFields(df_input):
    dates = df_input.index.to_series()  
    df_input['day_of_week'] = dates.dt.dayofweek
    df_input['day_of_month'] = dates.dt.day
    df_input['day_of_year'] = dates.dt.dayofyear
    df_input['month_of_year'] = dates.dt.month
    df_input['isWeekStart'] = "No" #<--- Need help here
    df_input['isWeekEnd'] = "No" #<--- Need help here
    df_input['date'] = dates.dt.strftime('%Y-%m-%d')
    return df_input

How can I calculate if a row is beginning of week and end of week?

Example of what I am looking for:

date    day_of_week day_of_month    day_of_year month_of_year isWeekStart isWeekEnd
5/1/2017    0   1   121 5   1   0
5/2/2017    1   2   122 5   0   0
5/3/2017    2   3   123 5   0   0
5/4/2017    3   4   124 5   0   1 # short week, Thursday is last work day
5/8/2017    0   8   128 5   1   0
5/9/2017    1   9   129 5   0   0
5/10/2017   2   10  130 5   0   0
5/11/2017   3   11  131 5   0   0
5/12/2017   4   12  132 5   0   1
5/15/2017   0   15  135 5   1   0
5/16/2017   1   16  136 5   0   0
5/17/2017   2   17  137 5   0   0
5/18/2017   3   18  138 5   0   0
5/19/2017   4   19  139 5   0   1
5/23/2017   1   23  143 5   1   0 # short week, Tuesday is first work day
5/24/2017   2   24  144 5   0   0
5/25/2017   3   25  145 5   0   0
5/26/2017   4   26  146 5   0   1
5/30/2017   1   30  150 5   1   0

EDIT: I forgot that some holidays fall during the middle of week, in this situation, it would be good if it can treat these as a separate "week" with before and after marked accordingly. Although if it's not smart enough to figure this out, just getting the long weekend would be a good start.

Bill Software Engineer
  • 7,362
  • 23
  • 91
  • 174
  • Does this answer your question? [Get week start date (Monday) from a date column in Python (pandas)?](https://stackoverflow.com/questions/27989120/get-week-start-date-monday-from-a-date-column-in-python-pandas) – Vivs May 19 '21 at 03:59
  • Have a look at [offsets.BusinessDay](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BusinessDay.html). – Quang Hoang May 19 '21 at 04:05
  • I am sorry I don't know how to apply either of those links, can you guys create an answer based on the code I currently have? – Bill Software Engineer May 19 '21 at 04:13
  • In the event that a public holiday falls in the middle of the week, for example a Wednesday. What value should Tuesday and Thursday take? – Corralien May 19 '21 at 04:22
  • Oh... I haven't considered that situation, it would be good if it's smart enough to also count those a "distinct" week and there for mark them as start and end, although if it's not smart enough to figure that out, just figuring out the long weekend would be a great start. – Bill Software Engineer May 19 '21 at 04:27

2 Answers2

2

Here's an idea with BusinessDay:

prev_working_day = df['date'] - pd.tseries.offsets.BusinessDay(1)

df['isFirstWeekDay'] = (df['date'].dt.isocalendar().week != 
                        prev_working_day.dt.isocalendar().week)

And similar for last business day. Note that the default holiday calendar is US'. Check out this post for a different one.

Output:

         date  day_of_week  day_of_month  day_of_year  month_of_year  isFirstWeekDay
0  2017-05-01            0             1          121              5            True
1  2017-05-02            1             2          122              5           False
2  2017-05-03            2             3          123              5           False
3  2017-05-04            3             4          124              5           False
4  2017-05-08            0             8          128              5            True
5  2017-05-09            1             9          129              5           False
6  2017-05-10            2            10          130              5           False
7  2017-05-11            3            11          131              5           False
8  2017-05-12            4            12          132              5           False
9  2017-05-15            0            15          135              5            True
10 2017-05-16            1            16          136              5           False
11 2017-05-17            2            17          137              5           False
12 2017-05-18            3            18          138              5           False
13 2017-05-19            4            19          139              5           False
14 2017-05-23            1            23          143              5           False
15 2017-05-24            2            24          144              5           False
16 2017-05-25            3            25          145              5           False
17 2017-05-26            4            26          146              5           False
18 2017-05-30            1            30          150              5           False
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
2

Here's an approach using weekly groupby.

df['date'] = pd.to_datetime(df['date'])

business_days = df.assign(date_copy = df['date']).groupby(pd.Grouper(key='date_copy', freq='W'))['date'].apply(list).to_frame()
business_days['isWeekStart'] = business_days['date'].apply(lambda x: [1 if i == min(x) else 0 for i in x])
business_days['isWeekEnd'] = business_days['date'].apply(lambda x: [1 if i == max(x) else 0 for i in x])
business_days = business_days.apply(pd.Series.explode)

pd.merge(df, business_days, left_on='date', right_on='date')

output:

          date  day_of_week day_of_month    day_of_year month_of_year   isWeekStart isWeekEnd
0   2017-05-01            0            1            121             5             1         0
1   2017-05-02            1            2            122             5             0         0
2   2017-05-03            2            3            123             5             0         0
3   2017-05-04            3            4            124             5             0         1
4   2017-05-08            0            8            128             5             1         0
5   2017-05-09            1            9            129             5             0         0
6   2017-05-10            2           10            130             5             0         0
7   2017-05-11            3           11            131             5             0         0
8   2017-05-12            4           12            132             5             0         1
9   2017-05-15            0           15            135             5             1         0
10  2017-05-16            1           16            136             5             0         0
11  2017-05-17            2           17            137             5             0         0
12  2017-05-18            3           18            138             5             0         0
13  2017-05-19            4           19            139             5             0         1
14  2017-05-23            1           23            143             5             1         0
15  2017-05-24            2           24            144             5             0         0
16  2017-05-25            3           25            145             5             0         0
17  2017-05-26            4           26            146             5             0         1
18  2017-05-30            1           30            150             5             1         1

Note that 2017-05-30 is marked as both WeekStart and WeekEnd because it is the only date of that week.

Gusti Adli
  • 1,225
  • 4
  • 13