Float issue when using list(zip(...)) on Dataframe float32 columns

Question

While trying to create a tuple column consisting of latitude and longitude coordinates from two seperate columns I stumpled upon zip as a pretty fast alternative to itertuples, list comprehensions, etc. It needs to be fast because I am dealing with roughly 4M rows and I don't want to waste my time on attribute creation.

The good thing is, my question perfectly asks itself by looking at the output of this Code: What is happening and how can this be prevented? I am absolutely positive that e.g. 52.353500 is as precise as it gets and the Dataframe is not just cutting it of for view - because this already equals a (very rough) positional precision of 10 centimeters.

print(df['lat'].head())
print(df['long'].head())
list(zip(df['lat'].head(), df['long'].head()))

Output:

14    52.353500
37    52.355511
42    52.354019
44    52.373829
83    52.354599
Name: lat, dtype: float32

14    5.00611
37    4.90732
42    4.92045
44    4.84816
83    4.89405
Name: long, dtype: float32

[(52.35350036621094, 5.006110191345215),
 (52.35551071166992, 4.907320022583008),
 (52.35401916503906, 4.920450210571289),
 (52.37382888793945, 4.8481597900390625),
 (52.35459899902344, 4.894050121307373)]

As requested: The Dataframe was loaded using read_csv with dtype float32 for both columns.

Solution: It was a mixture of me not knowing the limitations of Series representation of floats, not using float_precision when reading the data in and using float32 in combination with float_precision. Kids, use float dtype and let Pandas decide (to use float64).

You should show us how you loaded your DataFrame, so we can replicate it and help you make sure that there wasn't extra precision. — Kyle, Jun 04 '19 at 13:45

cs95 · Accepted Answer · 2019-06-04T14:26:14.047

2

This is perfectly well defined behaviour, pandas is truncating the trailing digits based on the preset precision:

import math  

math.pi  
# 3.141592653589793

pi has 15 digits of precision here. However, in a Series, it does not show as being so:

pd.Series([math.pi])                                                                                                   

0    3.141593
dtype: float64

pd.Series([math.pi]) .tolist()                                                                                         
# [3.141592653589793]

This is because,

pd.get_option('precision')                                                                                             
# 6

Read more about Options and Settings and how you can change them.

If you want to actually round your floats to a certain precision, use round:

pd.Series([math.pi]).round(decimals=6).tolist()                                                                        
# [3.141593]

edited Jun 04 '19 at 14:26

answered Jun 04 '19 at 13:47

cs95

379,657
97
704
746

oh wow, didn't know about the Series part. I guess `the Dataframe is not just cutting it of for view` is wrong then. However, I would assume that the following decimal digits were just zeros and an expanded view would show exactly this? Because, as I said, I am sure about the input only having six digit precision. – dasjanik Jun 04 '19 at 13:51
@dasjanik I can't comment further without any more context, but like I said you can always round up to the desired precision. – cs95 Jun 04 '19 at 13:52
1

looking at the Dataframe with `.tolist()` reveals that, indeed, the "faulty" floats were hiding there all the time - https://stackoverflow.com/a/47368368 was the answer I didn't know I needed. Even though neither `high` nor `round_trip` precision eliminates every error this brought me to either using decimals or really rounding. Thanks for pointing me at the precision limitation/option with Series. – dasjanik Jun 04 '19 at 14:34

Float issue when using list(zip(...)) on Dataframe float32 columns

1 Answers1