5

Hello I have an issue to convert column of object to integer for complete column.

I have a data frame and I tried to convert some columns that are detected as Object into Integer (or Float) but all the answers I already found are working for me

First status

Then I tried to apply the to_numeric method but doesn't work. To numeric method

Then a custom method that you can find here: Pandas: convert dtype 'object' to int but doesn't work either: data3['Title'].astype(str).astype(int) ( I cannot pass the image anymore - You have to trust me that it doesn't work)

I tried to use the inplace statement but doesn't seem to be integrated in those methods:

I am pretty sure that the answer is dumb but cannot find it

Zoe
  • 27,060
  • 21
  • 118
  • 148
Pitchkrak
  • 340
  • 1
  • 3
  • 11
  • 2
    You need to self-assign e.g. `data3['Title'] = pd.to_numeric(data3['Title'])` or `data3['Title'] data3['Title'].astype(int)` There really should be a canonical question for this as this variant appears umpteen times – EdChum Apr 28 '17 at 10:25

6 Answers6

9

You need assign output back:

#maybe also works omit astype(str)
data3['Title'] = data3['Title'].astype(str).astype(int)

Or:

data3['Title'] = pd.to_numeric(data3['Title'])

Sample:

data3 = pd.DataFrame({'Title':['15','12','10']})
print (data3)
  Title
0    15
1    12
2    10

print (data3.dtypes)
Title    object
dtype: object

data3['Title'] = pd.to_numeric(data3['Title'])
print (data3.dtypes)
Title    int64
dtype: object

data3['Title'] = data3['Title'].astype(int)

print (data3.dtypes)
Title    int32
dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
4

As python_enthusiast said ,

This command works for me too

data3.Title = data3.Title.str.replace(',', '').astype(float).astype(int)

but also works fine with

data3.Title = data3.Title.str.replace(',', '').astype(int)

you have to use str before replace in order to get rid of commas only then change it to int/float other wise you will get error .

dt170
  • 417
  • 2
  • 12
2

2 years and 11 months later, but here I go.

It's important to check if your data has any spaces, special characters (like commas, dots, or whatever else) first. If yes, then you need to basically remove those and then convert your string data into float and then into an integer (this is what worked for me for the case where my data was numerical values but with commas, like 4,118,662).

data3.Title = data3.Title.str.replace(',', '').astype(flaoat).astype(int)
0

also you can try this code, work fine with me

data3.Title= pd.factorize(data3.Title)[0]
0

I had a dataset like this

dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 79902 entries, 0 to 79901
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Query             79902 non-null  object
 1   Video Title       79902 non-null  object
 2   Video ID          79902 non-null  object
 3   Video Views       79902 non-null  object
 4   Comment ID        79902 non-null  object
 5   cleaned_comments  79902 non-null  object
dtypes: object(6)
memory usage: 5.5+ MB

Removed the None, NaN entries using

dataset = dataset.replace(to_replace='None', value=np.nan).dropna()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 79868 entries, 0 to 79901
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Query             79868 non-null  object
 1   Video Title       79868 non-null  object
 2   Video ID          79868 non-null  object
 3   Video Views       79868 non-null  object
 4   Comment ID        79868 non-null  object
 5   cleaned_comments  79868 non-null  object
dtypes: object(6)
memory usage: 6.1+ MB

Notice the reduced entries

But the Video Views were floats, as shown in dataset.head()

Then I used

dataset['Video Views'] = pd.to_numeric(dataset['Video Views'])
dataset['Video Views'] = dataset['Video Views'].astype(int)

Now,

<class 'pandas.core.frame.DataFrame'>
Int64Index: 79868 entries, 0 to 79901
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Query             79868 non-null  object
 1   Video Title       79868 non-null  object
 2   Video ID          79868 non-null  object
 3   Video Views       79868 non-null  int64 
 4   Comment ID        79868 non-null  object
 5   cleaned_comments  79868 non-null  object
dtypes: int64(1), object(5)
memory usage: 6.1+ MB
Zoe
  • 27,060
  • 21
  • 118
  • 148
Ketan
  • 13
  • 1
  • 3
0

Version that works with Nulls

With older version of Pandas there was no NaN for int but newer versions of pandas offer Int64 which has pd.NA.

So to go from object to int with missing data you can do this.

df['col'] = df['col'].astype(float)
df['col'] = df['col'].astype('Int64')

By switching to float first you avoid object cannot be converted to an IntegerDtype error.

Note it is capital 'I' in the Int64.

More info here https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

Working with pd.NA

In Pandas 1.0 the new pd.NA datatype has been introduced; the goal of pd.NA is to provide a “missing” indicator that can be used consistently across data types (instead of np.nan, None or pd.NaT depending on the data type).

With this in mind they have created the dataframe.convert_dtypes() and Series.convert_dtypes() functions which converts to datatypes that support pd.NA. This is currently considered experimental but might well be a bright future.

Cam
  • 1,263
  • 13
  • 22