0

Say that I have a dataframe with repeating column values

data = {'1':[1, 1, 1, 2, 2, 3, 4, 4, 4, 4]}
df = pd.DataFrame(data)
df.head(10)

    1
0   1
1   1
2   1
3   2
4   2
5   3
6   4
7   4
8   4
9   4

How would I create another column which has a unique count number for how many times that value has been seen before, going either up or down the rows.

This is an example desired result


1   uniquee
0   1   0
1   1   1
2   1   2
3   2   0
4   2   1
5   3   0
6   4   0
7   4   1
8   4   2
9   4   3

Going down the rows, for column values of column '1' that has a repeated value, the 'unique' column contains an integer referring to the number of times that value has been seen before.

A solution I came up with is to use iterrows

seen_set = {}
df['uniquee'] = ''
for index, row in df.iterrows():
    if row['1'] not in seen_set:
        seen_set[row['1']] = 0
    else:
        seen_set[row['1']] += 1
    df.loc[index , 'uniquee'] =  seen_set[row['1']]

But iterrows is generally discouraged as it tends to be slower. Is there a more computationally efficient method?

SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116

2 Answers2

3

Try this:

df.assign(unique = df.groupby('1').cumcount())
rhug123
  • 7,893
  • 1
  • 9
  • 24
0

Group the dataframe by the value column, then apply passing a lambda function to create the list of numbers in comprehension, finally explode and reset the index

>>> (df.groupby('1')
    .apply(lambda x: [i for i in range(x.size)])
    .explode().to_frame('unique').reset_index()
     )

OUTPUT:

   1 unique
0  1      0
1  1      1
2  1      2
3  2      0
4  2      1
5  3      0
6  4      0
7  4      1
8  4      2
9  4      3
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45