1

I have a dataset which has a Column date in a continuous format. I would like to add a new column to it which takes out week from that value in the Date column.

A    B
1    20050121
2    20050111
3    20050205
4    20050101

Here the B column idicates the date in the YEAR|MONTH|DAY format, I would like to add a new column to this dataset which takes in the month date from the dataset and tells us which week it belongs to, something like this:

A    B           C
1    20050121    3
2    20050111    2
3    20050205    5
4    20050101    1

The week starts from the 1st january of 2005. I thought of splitting the values of moth and date separately and then calculate according to these two values, How can I do this?

Sahil
  • 439
  • 2
  • 6
  • 17
  • "which takes in the month date from the dataset and tells us which week it belongs to" it is rather strange. You mean from which week this month starts with, or you want to determine week using month and day? – Grigory Sep 03 '17 at 07:38
  • Determine which week using Month and Day – Sahil Sep 03 '17 at 07:44
  • No that is fine, strftime suggested by you is really a timesaver! @jezrael – Sahil Sep 03 '17 at 08:55
  • I write it because seems first was accepted my answer and then another - so it is possible you want accept both answers. But in SO it is impossible, only one can be accepted. And it is up you ;) – jezrael Sep 03 '17 at 09:02

2 Answers2

2

It seems you need strftime by http://strftime.org/:

df['C'] = pd.to_datetime(df['B'], format='%Y%m%d').dt.strftime('%W')
print (df)
   A         B   C
0  1  20050121  03
1  2  20050111  02
2  3  20050205  05
3  4  20050101  00

If need ints:

df['C'] = pd.to_datetime(df['B'], format='%Y%m%d').dt.strftime('%W').astype(int)
print (df)
   A         B  C
0  1  20050121  3
1  2  20050111  2
2  3  20050205  5
3  4  20050101  0

If use weekofyear get more as 50 for first week:

df['C'] = pd.to_datetime(df['B'], format='%Y%m%d').dt.weekofyear
print (df)
   A         B   C
0  1  20050121   3
1  2  20050111   2
2  3  20050205   5
3  4  20050101  53

But is possible mask it:

dates = pd.to_datetime(df['B'], format='%Y%m%d')
m = (dates.dt.month == 1) & (dates.dt.weekofyear > 50)
df['C'] = np.where(m, 1, dates.dt.weekofyear)
print (df)
   A         B  C
0  1  20050121  3
1  2  20050111  2
2  3  20050205  5
3  4  20050101  1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

In general, this will work, but here are some confusion about year beginning

import datetime
date_from_str = datetime.datetime.strptime

df = pd.DataFrame([[1, 20050121],
                   [2, 20050111],
                   [3, 20050205],
                   [4, 20050101]], columns = ['A','B'])
df['C']= df['B'].astype('str').apply(lambda date:     
date_from_str(date,'%Y%m%d').isocalendar()[1])
df

Output is:

A   B   C
0   1   20050121    3
1   2   20050111    2
2   3   20050205    5
3   4   20050101    53

To avoid this some guy from here suggest this ad-hoc:

def correct(date_):
    year, week = date_.year, date_.isocalendar()[1]
    ret = datetime.strptime('%04d-%02d-1' % (year, week), '%Y-%W-%w')
    if date(year, 1, 4).isoweekday() > 4:
        ret -= timedelta(days=7)
    return ret.isocalendar()[1]

df['C']= df['B'].astype('str').apply(lambda date:  correct(date_from_str(date,'%Y%m%d')))

Then, output will be:

A   B   C
0   1   20050121    3
1   2   20050111    2
2   3   20050205    5
3   4   20050101    1
Grigory
  • 679
  • 1
  • 4
  • 22