37

I've got a pandas DataFrame with a float (on decimal) index which I use to look up values (similar to a dictionary). As floats are not exactly the value they are supposed to be multiplied everything by 10 and converted it to integers .astype(int) before setting it as index. However this seems to do a floor instead of rounding. Thus 1.999999999999999992 is converted to 1 instead of 2. Rounding with the pandas.DataFrame.round() method before does not avoid this problem as the values are still stored as floats.

The original idea (which obviously rises a key error) was this:

idx = np.arange(1,3,0.001)
s = pd.Series(range(2000))
s.index=idx
print(s[2.022])

trying with converting to integers:

idx_int = idx*1000
idx_int = idx_int.astype(int)
s.index = idx_int
for i in range(1000,3000):
    print(s[i])

the output is always a bit random as the 'real' value of an integer can be slightly above or below the wanted value. In this case the index contains two times the value 1000 and does not contain the value 2999.

NicoH
  • 1,240
  • 3
  • 12
  • 23

4 Answers4

53

You are right, astype(int) does a conversion toward zero:

‘integer’ or ‘signed’: smallest signed int dtype

from pandas.to_numeric documentation (which is linked from astype() for numeric conversions).

If you want to round, you need to do a float round, and then convert to int:

df.round(0).astype(int)

Use other rounding functions, according your needs.


the output is always a bit random as the 'real' value of an integer can be slightly above or below the wanted value

Floats are able to represent whole numbers, making a conversion after round(0) lossless and non-risky, check here for details.

Arne
  • 17,706
  • 5
  • 83
  • 99
Giacomo Catenazzi
  • 8,519
  • 2
  • 24
  • 32
  • Did you mean `floor` rather than `ceil`? (Though actually, it's neither: it's a truncation operation - i.e., it rounds towards zero, rather than towards positive infinity (ceil) or towards negative infinity (floor).) – Mark Dickinson Mar 07 '18 at 14:22
  • @MarkDickinson: right. I did correctly on the first version, but than I confused smallest with 'ceil' (but meaning 'floor'). Verified, "smallest" is toward zero. Thank you. – Giacomo Catenazzi Mar 07 '18 at 14:30
  • 1
    Also, if you're worried about NaNs, `df.round(0).astype(pd.Int64Dtype())` :) (https://stackoverflow.com/questions/21287624/convert-pandas-column-containing-nans-to-dtype-int#21290084) – Tomasz Gandor Sep 29 '20 at 23:25
  • @TomaszGandor, problem with using pd.Int64Dtype() is cannot subesquently fillna('') as typical to render a table with blankspace for NaN. Throws: "TypeError – alancalvitti Nov 13 '21 at 16:06
  • @alancalvitti - this may not be the right approach (mangling data for visualization), but probably recasting it again `.astype(object).fillna('')` could do the trick. – Tomasz Gandor Nov 13 '21 at 21:02
10

If I understand right you could just perform the rounding operation followed by converting it to an integer?

s1 = pd.Series([1.2,2.9])
s1 = s1.round().astype(int)

Which gives the output:

0    1
1    3
dtype: int32
Matt
  • 161
  • 1
  • 10
3

In case the data frame contains both, numeric and non-numeric values and you only want to touch numeric fields:

df = df.applymap(lambda x: int(round(x, 0)) if isinstance(x, (int, float)) else x)
momo
  • 3,313
  • 2
  • 19
  • 37
  • This is very useful to round all the elements in a Dataframe – Carlos AG Dec 14 '21 at 16:18
  • Select_dtypes is an alternative to doing list comprehension: `df.select_dtypes(include=np.number).applymap(lambda x: int(round(x, 0)))` – tbrk Jul 13 '22 at 14:58
1

There is a potential that NA as a float type exists in the dataframe. so an alternative solution is: df.fillna(0).astype('int')

Yuchao Jiang
  • 3,522
  • 30
  • 23