0

I have a sorted pandas data frame like the following:

ID      Values
'AAAA'   1
'AAAA'   2
'AAAA'   3
'BBBB'   4
'CCCC'   5
'CCCC'   6

I have to create a new column "ID_Index" which will be like this

ID      Values  ID_Index
'AAAA'   1         1
'AAAA'   2         1
'AAAA'   3         1
'BBBB'   4         2
'CCCC'   5         3
'CCCC'   6         3

That means the code will check ID of row 1, it will start counting as 1. Then whenever there is a new id the counter will be +1. In SAS we used to do it using first.id and/or last.id. Here I found that using shift() we can create that. I tried the following code in python, but it is not working.

c=1
for index, row in df_pandas.iterrows():
    if (df_pandas['ID'] == df_pandas['ID'].shift()):
        df_pandas['ID_Index']=c
    else:
         df_pandas['ID_Index'] = c+1
print df_pandas
Arun
  • 119
  • 1
  • 11

2 Answers2

1

I think need factorize if want same values:

df['new'] = pd.factorize(df['ID'])[0] + 1

Or if possible multiple groups:

df['new'] = df['ID'].ne(df['ID'].shift()).cumsum()

print (df)
       ID  Values  new
0  'AAAA'       1    1
1  'AAAA'       2    1
2  'AAAA'       3    1
3  'BBBB'       4    2
4  'CCCC'       5    3
5  'CCCC'       6    3

Difference best seen in added new row:

df['new1'] = pd.factorize(df['ID'])[0] + 1
df['new2'] = df['ID'].ne(df['ID'].shift()).cumsum()
print (df)
       ID  Values  new1  new2
0  'AAAA'       1     1     1
1  'AAAA'       2     1     1
2  'AAAA'       3     1     1
3  'BBBB'       4     2     2
4  'CCCC'       5     3     3
5  'CCCC'       6     3     3
6  'AAAA'       7     1     4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Since dataframe is already sorted (assuming sorted by `ID`), I think only the first scenario is applicable. – jpp Aug 29 '18 at 13:02
1

what i do is that i would convert the ID column into a list, manually create ID_Index and then append it to the dataframe

ids=df_pandas['ID'].tolist()
counter=1
id_index=[1]    

for idx,id in enumerate(ids):
    if idx!=0:
        if ids[idx-1]!=id:
            counter+=1
        id_index.append(counter)

df_pandas['ID_Index']=id_index

sure this might not be the perfect way, but gets the job done

Imtinan Azhar
  • 1,725
  • 10
  • 26
  • With Pandas, manual loops are not recommended. See @jezrael's solution and marked duplicate for vectorised solutions. – jpp Aug 29 '18 at 13:04