create sub-index of groups with pandas and groupby

Question

I have a dataframe that has an ID column and I would like to add a column to the dataframe is an index for each unique ID. I was able to do this using 2 for-loops with the example below by making a list from the ID count, converting it to an array, and adding it to the dataframe.

However, I was hoping to simplify this using pandas groupby() instead and avoid using these 2 for-loops it's not too efficient. I tried playing around with groupby() and set_index() but I couldn't get anything to work. Does anybody know how I can simplify all these steps using pandas groupby()?

df = pd.DataFrame({'ID': (4,4,4,5,6,7,7,7,7),
          'Record': (1,4,5,1,1,1,5,7,8),
          'color': ('b','r','r','r','r','b','b','b','b')})

    ID   Record    color
     4        1        b
     4        4        r
     4        5        r
     5        1        r
     6        1        r
     7        1        b
     7        5        b
     7        7        b
     7        8        b

##############################################
### Steps I did to add the make sub index ID to df
def convert_to_sub_index_array(count_list):
sub_ID_list = []
for c in count_list:
    for i in range(c):
        sub_ID_list.append(i+1)

return np.asarray(sub_ID_list)


ID_group=df.groupby(['ID']).count()
ID_count = ID_group['Record'].values.tolist()
sub_ID_index = convert_to_sub_index_array(ID_count)
df['ID_index'] = sub_ID_index
#################################################

    ID   Record    color    ID_index
     4        1        b           1
     4        4        r           2
     4        5        r           3
     5        1        r           1
     6        1        r           1
     7        1        b           1
     7        5        b           2
     7        7        b           3
     7        8        b           4

@jezrael How do you find the other posts that answer questions so quickly? I looked for a long time and couldn't find the answer to my question and you found it in like 30 seconds. I would like to get better at searching for answers so I can avoid posting duplicate questions which I didn't know were duplicate. — zipline86, Sep 11 '18 at 07:41
hmmm, there is main problem I know what looking for - `cumcount` function. If dont know name of it, it is not easy find it. — jezrael, Sep 11 '18 at 07:43

create sub-index of groups with pandas and groupby

0 Answers0