Converting object to Int pandas

Question

Hello I have an issue to convert column of object to integer for complete column.

I have a data frame and I tried to convert some columns that are detected as Object into Integer (or Float) but all the answers I already found are working for me

First status

Then I tried to apply the to_numeric method but doesn't work. To numeric method

Then a custom method that you can find here: Pandas: convert dtype 'object' to int but doesn't work either: data3['Title'].astype(str).astype(int) ( I cannot pass the image anymore - You have to trust me that it doesn't work)

I tried to use the inplace statement but doesn't seem to be integrated in those methods:

I am pretty sure that the answer is dumb but cannot find it

You need to self-assign e.g. `data3['Title'] = pd.to_numeric(data3['Title'])` or `data3['Title'] data3['Title'].astype(int)` There really should be a canonical question for this as this variant appears umpteen times — EdChum, Apr 28 '17 at 10:25

jezrael · Accepted Answer · 2017-04-28T10:30:11.557

You need assign output back:

#maybe also works omit astype(str)
data3['Title'] = data3['Title'].astype(str).astype(int)

Or:

data3['Title'] = pd.to_numeric(data3['Title'])

Sample:

data3 = pd.DataFrame({'Title':['15','12','10']})
print (data3)
  Title
0    15
1    12
2    10

print (data3.dtypes)
Title    object
dtype: object

data3['Title'] = pd.to_numeric(data3['Title'])
print (data3.dtypes)
Title    int64
dtype: object

data3['Title'] = data3['Title'].astype(int)

print (data3.dtypes)
Title    int32
dtype: object

score 4 · Answer 2 · answered Apr 28 '20 at 09:27

As python_enthusiast said ,

This command works for me too

data3.Title = data3.Title.str.replace(',', '').astype(float).astype(int)

but also works fine with

data3.Title = data3.Title.str.replace(',', '').astype(int)

you have to use str before replace in order to get rid of commas only then change it to int/float other wise you will get error .

score 2 · Answer 3 · answered Apr 09 '20 at 14:15

2 years and 11 months later, but here I go.

It's important to check if your data has any spaces, special characters (like commas, dots, or whatever else) first. If yes, then you need to basically remove those and then convert your string data into float and then into an integer (this is what worked for me for the case where my data was numerical values but with commas, like 4,118,662).

data3.Title = data3.Title.str.replace(',', '').astype(flaoat).astype(int)

score 0 · Answer 4 · answered Jul 01 '19 at 17:59

0

also you can try this code, work fine with me

data3.Title= pd.factorize(data3.Title)[0]

answered Jul 01 '19 at 17:59

ArwaFahad

1

score 0 · Answer 5 · edited Jul 25 '21 at 21:31

I had a dataset like this

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 79902 entries, 0 to 79901
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Query             79902 non-null  object
 1   Video Title       79902 non-null  object
 2   Video ID          79902 non-null  object
 3   Video Views       79902 non-null  object
 4   Comment ID        79902 non-null  object
 5   cleaned_comments  79902 non-null  object
dtypes: object(6)
memory usage: 5.5+ MB

Removed the None, NaN entries using

dataset = dataset.replace(to_replace='None', value=np.nan).dropna()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 79868 entries, 0 to 79901
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Query             79868 non-null  object
 1   Video Title       79868 non-null  object
 2   Video ID          79868 non-null  object
 3   Video Views       79868 non-null  object
 4   Comment ID        79868 non-null  object
 5   cleaned_comments  79868 non-null  object
dtypes: object(6)
memory usage: 6.1+ MB

Notice the reduced entries

But the Video Views were floats, as shown in dataset.head()

Then I used

dataset['Video Views'] = pd.to_numeric(dataset['Video Views'])
dataset['Video Views'] = dataset['Video Views'].astype(int)

Now,

<class 'pandas.core.frame.DataFrame'>
Int64Index: 79868 entries, 0 to 79901
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Query             79868 non-null  object
 1   Video Title       79868 non-null  object
 2   Video ID          79868 non-null  object
 3   Video Views       79868 non-null  int64 
 4   Comment ID        79868 non-null  object
 5   cleaned_comments  79868 non-null  object
dtypes: int64(1), object(5)
memory usage: 6.1+ MB

Cam · Answer 6 · 2021-07-25T21:10:36.453

Version that works with Nulls

With older version of Pandas there was no NaN for int but newer versions of pandas offer Int64 which has pd.NA.

So to go from object to int with missing data you can do this.

df['col'] = df['col'].astype(float)
df['col'] = df['col'].astype('Int64')

By switching to float first you avoid object cannot be converted to an IntegerDtype error.

Note it is capital 'I' in the Int64.

More info here https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

Working with pd.NA

In Pandas 1.0 the new pd.NA datatype has been introduced; the goal of pd.NA is to provide a “missing” indicator that can be used consistently across data types (instead of np.nan, None or pd.NaT depending on the data type).

With this in mind they have created the dataframe.convert_dtypes() and Series.convert_dtypes() functions which converts to datatypes that support pd.NA. This is currently considered experimental but might well be a bright future.

Converting object to Int pandas

6 Answers6

Version that works with Nulls

Working with pd.NA

Linked