Weird problem (bug?) in Pandas astype('int')

Question

Try this:

import pandas as pd

x=pd.Series([314.02,314.03])

get back:

0    314.02
1    314.03

Now, multiply the series by 100:

y=(100*x)

You get:

0    31402.0
1    31403.0
dtype: float64

Now, convert to integer type:

y.astype('int64')

you get:

0    31402
1    31402

Huh???? %$#X#$% Very weird behavior!!!

If I instead type:

y.round(0).astype('int64')

I get the expected result:

0    31402
1    31403

Is this a bug I should be reporting? Or is there some kind of subtle issue in floating point representation of 31403.0 ? I'm stumped as to why I am seeing this behavior-- the round() method fix should not be necessary, I would think....

But this is not an isolated example, happens with many floating point numbers! What am I missing?

I'm running Python 3.8.2, Pandas 1.2.5

Thanks in advance for any help in understanding this behavior.

You don't even need `pandas`. Check the result of `34103.0 == 314.03 * 100`. — Matthias, Jul 18 '21 at 21:48
This should answer your question: https://stackoverflow.com/questions/49153253/pandas-rounding-when-converting-float-to-integer — saferif, Jul 18 '21 at 21:48
Math operations in many languages do not work as would be normally expected `314.03 * 100 => 31402.999999999996` converting to `int` performs a truncation operation. For this reason, the decimal portion is removed leaving `31402`. While when _printing_ the Series, however, the default display behaviour _rounded_ the _display_ value (not the actual value) which shows `31403.0`. Which as you noted, _rounding_ produces the expected since it considers the values after the decimal place rather than just removing them. — Henry Ecker, Jul 18 '21 at 21:53
Answers my question, understood that floating points are approximations, but never expected astype('int') to round a floating point 31403.0 down to an integer 31402! I'm not testing equality. Thanks for alerting me to astype truncating when converting! — Austin Ken, Jul 18 '21 at 22:00
`astype(int)` does not _round_. It _truncates_. The decimal portion is not considered at all `int(6.1) -> 6` just as well as `int(6.9999999) -> 6`. Which is why, as you've already discovered `round` first will work. — Henry Ecker, Jul 18 '21 at 22:01

Weird problem (bug?) in Pandas astype('int')

0 Answers0