How to Fix Floating Point Discrepencies in Python Pandas Dataframes?

Question

I'm reading a CSV file into a Panda's dataframe. When retrieving the data, I'm getting slightly different values then the original data.

I believe it has something to do with the way Python represents decimals. But how do I fix it/work around it?

CSV data example:

1313331280,10.4,0.779
1313334917,10.4,0.316
1313334917,10.4,0.101
1313340309,10.5,0.15
1313340309,10.5,1.8

Pandas dataframe:

df = pd.read_csv(csv_file_full_path, names=['time','price', 'volume'])

The output:

ORDERS_DATA_FRAME.iloc[0]['volume']

source file value = 0.779
the pandas output value = 0.77900000000000003

The data is getting changed when read into the Pandas dataframe. What's the fix?

Possible duplicate of [Is floating point math broken?](http://stackoverflow.com/questions/588004/is-floating-point-math-broken) — jonrsharpe, Jul 13 '16 at 11:51
Something else is going on here. First, I'm not doing any math. I'm just reading data that was read into the Panda's dataframe. Next, it's Python, not Javascript — Emily, Jul 13 '16 at 14:30
No, it's exactly the same. `0.779` **cannot** be represented exactly as a floating point number, so you see *almost* that number in the dataframe. Language is irrelevant. See e.g. http://floating-point-gui.de/. Also, note that the dataframe doesn't belong to a panda ;o) — jonrsharpe, Jul 13 '16 at 14:33
Regardless, then, is there a fix so that when I read that back that variable I get 0.779 exactly? As in the same number that was put into the dataframe? — Emily, Jul 13 '16 at 15:16
Not unless you can come up with a way to represent all of the infinite real numbers in a finite number of bits... Again, you cannot represent that number exactly in floating point arithmetic. — jonrsharpe, Jul 13 '16 at 15:17

score 2 · Accepted Answer · answered Jul 15 '16 at 07:51

2

Though the issue is because of the floating point arithmetic, if you know the maximum number of decimals your column has, you can use round(float_number, number_of_decimals) to get back your normal values. Alternatively, you can read the column as string and then convert it to float by using float(float_number_string).

answered Jul 15 '16 at 07:51

Sreyantha Chary

656
4
13

On the second suggestion of reading as as string and converting to a float (non literal float), its still stored as a binary representation; even-though its displayed as `0.779` when you print the variable; its actually not the same value – kri Dec 10 '21 at 16:01

How to Fix Floating Point Discrepencies in Python Pandas Dataframes?

1 Answers1