Say that I have a dataframe with repeating column values
data = {'1':[1, 1, 1, 2, 2, 3, 4, 4, 4, 4]}
df = pd.DataFrame(data)
df.head(10)
1
0 1
1 1
2 1
3 2
4 2
5 3
6 4
7 4
8 4
9 4
How would I create another column which has a unique count number for how many times that value has been seen before, going either up or down the rows.
This is an example desired result
1 uniquee
0 1 0
1 1 1
2 1 2
3 2 0
4 2 1
5 3 0
6 4 0
7 4 1
8 4 2
9 4 3
Going down the rows, for column values of column '1' that has a repeated value, the 'unique' column contains an integer referring to the number of times that value has been seen before.
A solution I came up with is to use iterrows
seen_set = {}
df['uniquee'] = ''
for index, row in df.iterrows():
if row['1'] not in seen_set:
seen_set[row['1']] = 0
else:
seen_set[row['1']] += 1
df.loc[index , 'uniquee'] = seen_set[row['1']]
But iterrows is generally discouraged as it tends to be slower. Is there a more computationally efficient method?