-2

I want to generate numbers based on strings in a column within a dataframe. I want to create numbers to represent each unique string.

Below is an example and the desired outcome.

String  Desired outcome
   A    1
   A    1
   B    2
   C    3
   D    4

The code below doesn't work because it creates many columns.

dummies = pd.get_dummies(df['String'])
Warrior
  • 223
  • 1
  • 10
  • What about using OrdinalEncoder from sklearn ? – Luc Bertin Dec 15 '20 at 21:23
  • You can do [this](https://stackoverflow.com/questions/32011359/convert-categorical-data-in-pandas-dataframe/32011969), use LabelEncoder, or use OrdinalEncoder as suggested above. – ssp Dec 15 '20 at 21:32

2 Answers2

0

You can use the ord() function to get the ascii value of a character such as:

ord('A')

The above command returns 65. If you want the characters to start from one, a simple method like ordFromOne(character) works fine:

def ordFromOne(c):
    return ord(c) - 64

Then you just run that over each of your characters. If your example characters are actually strings you can of course just map the function:

map(ordFromOne, example)
axwr
  • 2,118
  • 1
  • 16
  • 29
0

I don't know a lot about dataframes but you can get the desired outcome by doing:

def char_to_number(c):
    return ord(c) - ord('A') + 1

Docs for ord

Roy Cohen
  • 1,540
  • 1
  • 5
  • 22