I have a dataframe that has an ID column and I would like to add a column to the dataframe is an index for each unique ID. I was able to do this using 2 for-loops with the example below by making a list from the ID count, converting it to an array, and adding it to the dataframe.
However, I was hoping to simplify this using pandas groupby() instead and avoid using these 2 for-loops it's not too efficient. I tried playing around with groupby() and set_index() but I couldn't get anything to work. Does anybody know how I can simplify all these steps using pandas groupby()?
df = pd.DataFrame({'ID': (4,4,4,5,6,7,7,7,7),
'Record': (1,4,5,1,1,1,5,7,8),
'color': ('b','r','r','r','r','b','b','b','b')})
ID Record color
4 1 b
4 4 r
4 5 r
5 1 r
6 1 r
7 1 b
7 5 b
7 7 b
7 8 b
##############################################
### Steps I did to add the make sub index ID to df
def convert_to_sub_index_array(count_list):
sub_ID_list = []
for c in count_list:
for i in range(c):
sub_ID_list.append(i+1)
return np.asarray(sub_ID_list)
ID_group=df.groupby(['ID']).count()
ID_count = ID_group['Record'].values.tolist()
sub_ID_index = convert_to_sub_index_array(ID_count)
df['ID_index'] = sub_ID_index
#################################################
ID Record color ID_index
4 1 b 1
4 4 r 2
4 5 r 3
5 1 r 1
6 1 r 1
7 1 b 1
7 5 b 2
7 7 b 3
7 8 b 4