While trying to create a tuple column consisting of latitude and longitude coordinates from two seperate columns I stumpled upon zip
as a pretty fast alternative to itertuples
, list comprehensions, etc. It needs to be fast because I am dealing with roughly 4M rows and I don't want to waste my time on attribute creation.
The good thing is, my question perfectly asks itself by looking at the output of this Code: What is happening and how can this be prevented? I am absolutely positive that e.g. 52.353500
is as precise as it gets and the Dataframe is not just cutting it of for view - because this already equals a (very rough) positional precision of 10 centimeters.
print(df['lat'].head())
print(df['long'].head())
list(zip(df['lat'].head(), df['long'].head()))
Output:
14 52.353500
37 52.355511
42 52.354019
44 52.373829
83 52.354599
Name: lat, dtype: float32
14 5.00611
37 4.90732
42 4.92045
44 4.84816
83 4.89405
Name: long, dtype: float32
[(52.35350036621094, 5.006110191345215),
(52.35551071166992, 4.907320022583008),
(52.35401916503906, 4.920450210571289),
(52.37382888793945, 4.8481597900390625),
(52.35459899902344, 4.894050121307373)]
As requested: The Dataframe was loaded using read_csv
with dtype float32
for both columns.
Solution:
It was a mixture of me not knowing the limitations of Series representation of floats, not using float_precision
when reading the data in and using float32
in combination with float_precision
.
Kids, use float
dtype and let Pandas decide (to use float64
).