0

I'm working on a base that has 35 columns and 3047 rows. One of the columns is 'State' which consists of 50 states in the USA, and I want to convert that states into numeric values ,e.g. Washington is 1, West Virginia 2, etc.

df.loc[df['State']=='Washington','State']=1
df.loc[df['State']=='West Virginia','State']=2
.
.
df.loc[df['State']=='Arizona','State']=50
df['State']=df['State'].astype(str).astype(int)

I got this error: ValueError: invalid literal for int() with base 10:'Washington'. Can anyone help me? What can I do to fix this problem? Do you know some other way to convert dtype object in int? Thanks in advance

Stidgeon
  • 2,673
  • 8
  • 20
  • 28
Ana
  • 15
  • 2
  • Does this answer your question? [How can I one hot encode in Python?](https://stackoverflow.com/questions/37292872/how-can-i-one-hot-encode-in-python) – Yevhen Kuzmovych Mar 28 '21 at 22:36

1 Answers1

0

Assuming you have dataframe like this:

           State
0     Washington
1  West Virginia
2        Arizona
3     Washington

Then you can use pd.Categorical() to convert it to codes:

df["State"] = pd.Categorical(df["State"]).codes + 1
print(df)

Prints:

   State
0      2
1      3
2      1
3      2
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • It works, column State is converted but doesn't match other columns and rows, it seems like the numbers are randomly divided. – Ana Mar 29 '21 at 00:05