1

I am looking for a way to assign a new unique for every value in a column, starting at 0 with the very first column value at row zero in the df, and iterating by 1 as each new unique value is encountered, iterating through the rows. Here's an minimal example.

Say this is my data

dfso = pd.DataFrame([9, 3, 5, 8, 4, 2, 5, 6, 4, 7, 9, 8, 5, 3, 4, 5, 6, 8, 4, 2, ], columns = ['Value']) 

dfso 

  Value
0   9
1   3
2   5
3   8
4   4
5   2
6   5
7   6
8   4
9   7
10  9
11  8
12  5
13  3
14  4
15  5
16  6
17  8
18  4
19  2

And here's the result I am looking for

    Value   NewAssign
0   9   0
1   3   1
2   5   2
3   8   3
4   4   4
5   2   5
6   5   2
7   6   6
8   4   7
9   7   7
10  9   0
11  8   3
12  5   2
13  3   1
14  4   7
15  5   2
16  6   6
17  8   3
18  4   7
19  2   5

At row zero, the first value is 9, so 9 is assigned 0. At row one, the value is 3, so 3 is assigned 1. And so on. At row six, the value 5 already has an assignment, so that number is inserted instead, which is 2.

What I tried so far

I tried

pd.factorize(dfso)

But that only resulted in

ValueError: could not broadcast input array from shape (20,1) into shape (20)

Answer

dfso['New'] = pd.factorize(dfso['Value'])[0]

SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116

0 Answers0