Pandas: index-derived column with specific increments based on other columns

Question

I have the following data frame:

import pandas as pd
pandas_df = pd.DataFrame([
  ["SEX", "Male"], 
  ["SEX", "Female"], 
  ["EXACT_AGE", None], 
  ["Country", "Afghanistan"], 
  ["Country", "Albania"]], 
  columns=['FullName', 'ResponseLabel'
 ])

Now what I need to do is to add sort order to this dataframe. Each new "FullName" would increment it by 100 and each consecutive "ResponseLabel" for a given "FullName" would increment it by 1 (for this specific "FullName"). So I basically create two different sort orders that I sum later on.

pandas_full_name_increment = pandas_df[['FullName']].drop_duplicates()
pandas_full_name_increment = pandas_full_name_increment.reset_index()
pandas_full_name_increment.index += 1 
pandas_full_name_increment['SortOrderFullName'] = pandas_full_name_increment.index * 100

pandas_df['SortOrderResponseLabel'] = pandas_df.groupby(['FullName']).cumcount() + 1

pandas_df = pd.merge(pandas_df, pandas_full_name_increment, on = ['FullName'], how = 'left')

Result:

FullName    ResponseLabel   SortOrderResponseLabel  index   SortOrderFullName   SortOrder
0   SEX Male    1   0   100 101
1   SEX Female  2   0   100 102
2   EXACT_AGE   NULL    1   2   200 201
3   Country Afghanistan 1   3   300 301
4   Country Albania 2   3   300 302

The result that I get on my "SortOrder" column is correct but I wonder if there is some better approach pandas-wise?

Thank you!

Please don't post images. Images are discouraged on StackOverflow. Please take time to read [`how-to-make-good-reproducible-pandas-examples)`](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Shubham Sharma, Nov 29 '20 at 11:14

Vishnudev Krishnadas · Answer 1 · 2020-11-29T11:39:48.447

1

The best way to do this would be to use ngroup and cumcount

name_group = pandas_df.groupby('FullName')

pandas_df['sort_order'] = (
    name_group.ngroup(ascending=False).add(1).mul(100) +
    name_group.cumcount().add(1)
)

Output

    FullName ResponseLabel  sort_order
0        SEX          Male         101
1        SEX        Female         102
2  EXACT_AGE          None         201
3    Country   Afghanistan         301
4    Country       Albania         302

edited Nov 29 '20 at 11:39

answered Nov 29 '20 at 11:34

Vishnudev Krishnadas

10,679
2
23
55

I think `cumcount` would be more appropriate here +1 ;) – Shubham Sharma Nov 29 '20 at 11:41

Pandas: index-derived column with specific increments based on other columns

1 Answers1