2

I'm trying to map an integer which represents Year-Month ('Period' column in code below) into a new column that represents the Year-Quarter ('DIST_PERIOD' column).

For instance,

202101, 202102, and 202103 become '20211'

202104, 202105, and 202106 become '20212'

etc.

My code below is running but not how I thought it would.

df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '01') | (str(df['Period'])[4:] == '02') | (str(df['Period'])[4:] == '03'), str(df['Period'])[:4]+'1', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '04') | (str(df['Period'])[4:] == '05') | (str(df['Period'])[4:] == '06'), str(df['Period'])[:4]+'2', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '07') | (str(df['Period'])[4:] == '08') | (str(df['Period'])[4:] == '09'), str(df['Period'])[:4]+'3', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '10') | (str(df['Period'])[4:] == '11') | (str(df['Period'])[4:] == '12'), str(df['Period'])[:4]+'4', df['DIST_PERIOD'])

Not sure how to correct my str() so that I am correctly capturing the last two characters for each row.

mattie_g
  • 95
  • 9
  • "My code below is running but not how I thought it would" This is not a very useful description of the problem. Please [edit] to provide a sample of the input data, the current output, and the expected output so we can better understand the problem. See [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – G. Anderson Nov 08 '21 at 20:27
  • 1
    `df['DIST_PERIOD'] = df['Period'].str[:4] + pd.to_datetime(df['Period']+'01').dt.quarter.astype(str)` – It_is_Chris Nov 08 '21 at 20:30

3 Answers3

3

A better way is to convert the column to datetime and then access and combine the datetime properties year and quarter.

month_year = pd.to_datetime(df['Period'], format="%Y%m")
df['DIST_PERIOD'] = month_year.dt.year.astype(str) + month_year.dt.quarter.astype(str)

EDIT

Another option

month_year = pd.to_datetime(df['Period'], format="%Y%m")
df['DIST_PERIOD'] = month_year.dt.to_period("Q").dt.strftime('%Y%q')
Rodalm
  • 5,169
  • 5
  • 21
  • 1
    whoa what wizardry. thx for showing me this. I believe you meant ``` month_year = pd.to_datetime(df['Period'], format="%Y%m") ``` though, but I totally am understanding this and this is a good future-proof solution. – mattie_g Nov 08 '21 at 20:55
  • @mattie_g No worries, glad to help! Yes, thanks for the correction! I've added another solution – Rodalm Nov 08 '21 at 21:05
1

not sure if the np.where is a good choice here, instead use the map function from pandas.

create a dictionary objects of this mapping such for example: di= {'202101':'20211', '202102':'20211', '202103':'20211'}

same way, add the more dictionary object in above dictionary to create the mapping object.

after that do this:

df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di)

Note: it will create a NAN if cannot find the all the mapping object. If you want to map only few objects and leave other as untouched, then use:

df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di).fillna(df['yourcolumnnametobemapped'])
user96564
  • 1,578
  • 5
  • 24
  • 42
  • thanks this helps me move along. the only thing is that the dictionary must be maintained and I was looking for a dynamic solution as years go on. but this definitely helps I was stuck trying to solve this – mattie_g Nov 08 '21 at 20:50
1

If you want to automate it, you can look at a solution like this. You may have to play around with the datatype depending on your dataframe. Here I'm just passing the column into a function and returning a new value. Also assumes you'll always have 4 digit year and two digit month

data='''yrmo
202101
202102
202103
202104
202105
202106
202109
202111'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')

def get_quarter(x):
    mo = str(x)[-2:]
    yr = str(x)[0:4]
    if mo in ['01', '02', '03']:
        return yr + '1'
    elif mo in ['04', '05', '06']:
        return yr + '2'
    elif mo in ['07', '08', '09']:
        return yr + '3'
    else:
        return yr + '4'

df['yrmo'].apply(get_quarter)

0    20211
1    20211
2    20211
3    20212
4    20212
5    20212
6    20213
7    20214
Name: yrmo, dtype: object
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14