How to assign (and add as a column) a unique int for each value in column, starting at 0 for the first column value at row zero, and iterating by 1

Asked Sep 09 '19 at 23:46

Active Sep 10 '19 at 00:38

Viewed 35 times

I am looking for a way to assign a new unique for every value in a column, starting at 0 with the very first column value at row zero in the df, and iterating by 1 as each new unique value is encountered, iterating through the rows. Here's an minimal example.

Say this is my data

dfso = pd.DataFrame([9, 3, 5, 8, 4, 2, 5, 6, 4, 7, 9, 8, 5, 3, 4, 5, 6, 8, 4, 2, ], columns = ['Value']) 

dfso 

  Value
0   9
1   3
2   5
3   8
4   4
5   2
6   5
7   6
8   4
9   7
10  9
11  8
12  5
13  3
14  4
15  5
16  6
17  8
18  4
19  2

And here's the result I am looking for

    Value   NewAssign
0   9   0
1   3   1
2   5   2
3   8   3
4   4   4
5   2   5
6   5   2
7   6   6
8   4   7
9   7   7
10  9   0
11  8   3
12  5   2
13  3   1
14  4   7
15  5   2
16  6   6
17  8   3
18  4   7
19  2   5

At row zero, the first value is 9, so 9 is assigned 0. At row one, the value is 3, so 3 is assigned 1. And so on. At row six, the value 5 already has an assignment, so that number is inserted instead, which is 2.

What I tried so far

I tried

pd.factorize(dfso)

But that only resulted in

ValueError: could not broadcast input array from shape (20,1) into shape (20)

Answer

dfso['New'] = pd.factorize(dfso['Value'])[0]

edited Sep 10 '19 at 00:38

asked Sep 09 '19 at 23:46

SantoshGupta7

5,607
14
58
116

no idea what your expected result is. – AidanGawronski Sep 09 '19 at 23:50
Sorry, messed up the formatting, I fixed it – SantoshGupta7 Sep 09 '19 at 23:51
Your code `dfso =... ` does not create your example. – SpghttCd Sep 09 '19 at 23:53
Looks like you just want `pd.factorize` – user3483203 Sep 09 '19 at 23:54
Dupe deals with strings, but it's the same behavior for ints – user3483203 Sep 09 '19 at 23:55
@SpghttCd I fixed the example – SantoshGupta7 Sep 10 '19 at 00:03

How to assign (and add as a column) a unique int for each value in column, starting at 0 for the first column value at row zero, and iterating by 1

What I tried so far

Answer

0 Answers0