I am looking for a way to assign a new unique for every value in a column, starting at 0 with the very first column value at row zero in the df, and iterating by 1 as each new unique value is encountered, iterating through the rows. Here's an minimal example.
Say this is my data
dfso = pd.DataFrame([9, 3, 5, 8, 4, 2, 5, 6, 4, 7, 9, 8, 5, 3, 4, 5, 6, 8, 4, 2, ], columns = ['Value'])
dfso
Value
0 9
1 3
2 5
3 8
4 4
5 2
6 5
7 6
8 4
9 7
10 9
11 8
12 5
13 3
14 4
15 5
16 6
17 8
18 4
19 2
And here's the result I am looking for
Value NewAssign
0 9 0
1 3 1
2 5 2
3 8 3
4 4 4
5 2 5
6 5 2
7 6 6
8 4 7
9 7 7
10 9 0
11 8 3
12 5 2
13 3 1
14 4 7
15 5 2
16 6 6
17 8 3
18 4 7
19 2 5
At row zero, the first value is 9, so 9 is assigned 0. At row one, the value is 3, so 3 is assigned 1. And so on. At row six, the value 5 already has an assignment, so that number is inserted instead, which is 2.
What I tried so far
I tried
pd.factorize(dfso)
But that only resulted in
ValueError: could not broadcast input array from shape (20,1) into shape (20)
Answer
dfso['New'] = pd.factorize(dfso['Value'])[0]