21

This is the error that is showing up whenever i try to convert the dataframe to int.

("invalid literal for int() with base 10: '260,327,021'", 'occurred at index Population1'

Everything in the df is a number. I assume the error is due to the extra quote at the end but how do i fix it?

Caribgirl
  • 375
  • 2
  • 3
  • 10

4 Answers4

20

I run this

int('260,327,021')

and get this

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-448-a3ba7c4bd4fe> in <module>()
----> 1 int('260,327,021')

ValueError: invalid literal for int() with base 10: '260,327,021'

I assure you that not everything in your dataframe is a number. It may look like a number, but it is a string with commas in it.

You'll want to replace your commas and then turn to an int

pd.Series(['260,327,021']).str.replace(',', '').astype(int)

0    260327021
dtype: int64
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • oh wow. Thank you so much! So this means i have to replace everything then. Every number in the df looks like the one i gave. – Caribgirl May 08 '17 at 23:22
  • 1
    @Caribgirl yes! Unless you read it from a file, then you can pass a parameter to the `read_csv` function, namely `thousands=','` – piRSquared May 08 '17 at 23:22
  • Thank you so much it worked! omg. I have been trying to fix this for hours! Thank You!!!!! – Caribgirl May 08 '17 at 23:28
13

Others might encounter the following issue, when the string is a float:

    >>> int("34.54545")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '34.54545'

The workaround for this is to convert to a float first and then to an int:

>>> int(float("34.54545"))
34

Or pandas specific:

df.astype(float).astype(int)
kristian
  • 730
  • 5
  • 16
7

I solved the error using pandas.to_numeric

In your case,

data.Population1 = pd.to_numeric(data.Population1, errors="coerce")

'data' is the parent Object.

After that, you can convert float to int as well

data.Population1.astype(int)
Abhishek Sinha
  • 5,095
  • 1
  • 15
  • 14
  • 2
    This solved my issue of having a space int the values. Thank you. I also had to add df['series'].fillna(0).astype(int) aka .fillna(0) to get rid of the NaNs for my particular issue. – JQTs Jan 25 '22 at 20:41
2

For me, it was a bit different case.

I loaded my dataframe as such:

my_converter = {'filename': str, 'revision_id': int}

df = pd.read_csv("my.csv", header=0, sep="\t", converters=my_converter)

becuase head -n 3 my.csv looked like so:

"filename"     "revision_id"
"some_filename.pdf"     "224"
"another_filename.pdf"     "128"

However, down thousands of lines, there was an entry like this:

 "very_\"special\"_filename.pdf"     "46"

which meant that I had to specify the escape character to the read_csv(). Else, it would try to cast special as int for the revision_id field and generate the error.

So the correct way is to:

df = pd.read_csv("my.csv", header=0, sep="\t",  escapechar='\\', converters=my_converter)
Bikash Gyawali
  • 969
  • 2
  • 15
  • 33