Im adding week dates to big dataframes baset on day dates. It takes up to hours when working with 5 mln rows. Here is my code:
import datetime
import pandas as pd
pd.DataFrame(data = ['2022-05-01','2021-05-01','2022-02-15'])
df,col,interval_shift,pivot_index,kwargs = pd.DataFrame(data = ['2022-05-01','2021-05-01','2022-02-15']),0,0,'Week start date',{}
date_col_format = kwargs['date_col_format'] if 'date_col_format' in kwargs else '%Y-%m-%d'
print('Week Number starts',datetime.datetime.now())
df['Week Number'] = df[col].apply(lambda x: ((datetime.datetime.strptime(x, date_col_format) - (pd.DateOffset(days=interval_shift))).strftime('%Y') + '-'+(datetime.datetime.strptime(x, date_col_format)- (pd.DateOffset(days=interval_shift))).strftime('%V')))
print('Week Number ready',datetime.datetime.now())
df[pivot_index] = df['Week Number'].apply(lambda y: (pd.to_datetime(y + '-1', format='%G-%V-%u') + pd.DateOffset(days=interval_shift)).strftime('%Y-%m-%d'))
print('Week pivot_index ready',datetime.datetime.now())
df
I need both cols: week number and date of weeks mondays.
output
0 | Week Number | Week start date | |
---|---|---|---|
0 | 2022-05-01 | 2022-17 | 2022-04-25 |
1 | 2021-05-01 | 2021-17 | 2021-04-26 |
2 | 2022-02-15 | 2022-07 | 2022-02-14 |
explanation for interval_shift is nececity to shift week forward and backward and get number and start date of shifted week (from friday to thurstay for example)