0

I have a Pandas DataFrame containing train stations latitudes and longitudes, called stops:

station_id,station_name,station_lat,station_lon
'101','Van Cortlandt Park - 242 St',40.889248,-73.898583
'103','238 St',40.884667,-73.90087
'104','231 St',40.878856,-73.904834
'106','Marble Hill - 225 St',40.874561,-73.909831

Using pd.iterrows(), I coupled the latitudes and longitudes into tuples, packed them into a list, converted the list into a Pandas Series, and added the Series to the DataDrame as a new column, like so:

latlon_list = []
for ind, row in stops.iterrows():
    latlon_list.append((stops.at[ind, 'station_lat'], stops.at[ind, 'station_lon']))
latlon_series = pd.Series(latlon_list)
stops.insert(loc=4, column='station_lat_lon', value=latlon_series)

This mutated some of the lat lon values and added a few decimal places to them:

station_id  station_name    station_lat station_lon station_lat_lon
0   101 Van Cortlandt Park - 242 St 40.889248   -73.898583  (40.889247999999995, -73.898583)
1   103 238 St  40.884667   -73.900870  (40.884667, -73.90087)
2   104 231 St  40.878856   -73.904834  (40.878856, -73.904834)
3   106 Marble Hill - 225 St    40.874561   -73.909831  (40.874561, -73.90983100000001)
4   107 215 St  40.869444   -73.915279  (40.869444, -73.915279)

The mutation occurred in row 0 and row 3, but not the other rows. On row 0, the latitude figure is mutated, but on row 3, the longitude figure is mutated. Why does this happen, and how can I prevent it from happening?

Elan
  • 539
  • 1
  • 6
  • 18
  • 1
    This is due to floating point inaccuracy. To "stop it" you will probably need to use `decimal` module. But you likely do *not* want to do this as you will lose vectorised functionality. See also [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken). – jpp Oct 18 '18 at 02:31
  • 1
    you're looking at floating point precision problem, find corresponding documentation on pandas & numpy. for a shortcut you might want to convert it to string first to conserve precision, or perhaps use `decimal.Decimal` module. – deadvoid Oct 18 '18 at 02:32
  • It's not even doing arithmetic operations like `x * y`, just transferring numbers from one place to another. Would it be better to convert the numbers into strings first before transferring, and converting back to float with fixed decimal points afterwards? – Elan Oct 18 '18 at 02:33
  • I haven abandoned the lat-lon column and opted to separately retrieve the data from each lat and lon column every time. Just thought it would have been more convenient for data retrieval to have lat and lon in one column, turns out it's not worth the trouble. – Elan Oct 18 '18 at 03:45
  • I went on to try stuff like `round(number, 6)` while iterating each cell from the problem column and each figure inside the tuple of that cell. It didn't work because some values that already had 6 decimal places and ended with a 0 (`74.098790` for instance) ended up with 5 decimal places after rounding. A bit of time wasted but nice to know about the floating point quirk. – Elan Oct 18 '18 at 03:49

0 Answers0