1

I have a DataFrame that contains months and years:

df:
    month   year
0   Jan     2012.0
1   Feb     2012.0
2   Mar     2012.0
3   Apr     2012.0
4   May     2012.0
5   Jun     2012.0
6   Jul     2012.0
7   Aug     2012.0
8   Sep     2012.0
9   Oct     2012.0
10  Nov     2012.0
11  Dec     2012.0

I want to add another column which determines a business-year which starts on Mar on every year Something like this:.

df:
        month   year     business_year
    0   Jan     2012.0     2011
    1   Feb     2012.0     2011
    2   Mar     2012.0     2012
    3   Apr     2012.0     2012
    4   May     2012.0     2012
    5   Jun     2012.0     2012
    6   Jul     2012.0     2012
    7   Aug     2012.0     2012
    8   Sep     2012.0     2012
    9   Oct     2012.0     2012
    10  Nov     2012.0     2012
    11  Dec     2012.0     2012
    12  Jan     2013.0     2012
    13  Feb     2013.0     2012
Sam Al-Ghammari
  • 1,021
  • 7
  • 23
  • 1
    Why is "month" spelled "monath"? – cs95 Mar 11 '19 at 17:26
  • 2
    `df['business_year'] = df['year'] + df['monath'].apply(lambda x: -1 if x in ('Jan', 'Feb') else 0)`, assuming the first column is a string – Tarifazo Mar 11 '19 at 17:28
  • @Mstaino Worked, thanks. I have a question regarding this reoccurring warning everytime such column calls are happening in Pandas. `A value is trying to be set on a copy of a slice from a DataFrame.`, do you know how to get around this? – Sam Al-Ghammari Mar 11 '19 at 17:39
  • 1
    @debuggingXD "Thanks buddy, it worked", perhaps... but [please refrain from writing code that uses `apply`, because of terrible performance.](https://stackoverflow.com/questions/54028199/for-loops-with-pandas-when-should-i-care). To answer your other question, I have an [answer here](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas/53954986#53954986) that will help you, if you enjoy reading. – cs95 Mar 11 '19 at 17:40
  • 1
    @debuggingXD strange, the warning is caused because you are applying the formula to a part of the dataframe (which is strange). Also note that I gave you a very easy solution, but not the fastest as @coldspeed points out (that is, if you have a big `df` and want performance). I posted a solution both with and without apply so you have both examples. – Tarifazo Mar 11 '19 at 19:09

2 Answers2

4

Assuming your month is a string, you can use the following snippet:

df['business_year'] = df['year'] + df['month'].apply(lambda x: -1 if x in ('Jan', 'Feb') else 0)

Or, if you want something more performant:

df['business_year'] = df['year'] + ~df1['month'].isin(('Jan', 'Feb')) - 1
Tarifazo
  • 4,118
  • 1
  • 9
  • 22
  • You should cut out the first solution altogether, because your second one is purely vectorised and does not require working with datetimes at all. Good answer. – cs95 Mar 11 '19 at 19:11
2

IIUC, use pd.to_datetime to convert to datetime. You can then subtract 2 months from each date and return the corresponding year of the result.

import calendar

mapping = {calendar.month_abbr[i]: i for i in range(13)}
df['month'] = df['month'].map(mapping)

(pd.to_datetime(df.assign(day=1)) - pd.offsets.MonthBegin(2)).dt.year

0     2011
1     2011
2     2012
3     2012
4     2012
5     2012
6     2012
7     2012
8     2012
9     2012
10    2012
11    2012
dtype: int64
cs95
  • 379,657
  • 97
  • 704
  • 746