-1

Python program is automatically adding additional number in float longitude. I am creating new columns "src", "des" using others in csv. Can see the line number 4 in result csv, "des" contains longitude value different from that in Dropoff_longitude. But I want the same value with as it is. No modification while making tuple.

Code:

import pandas as pd
import numpy as np

def loc(x):

    return (round(x[0],14),round(x[1],14))


columns=["Pickup_latitude","Pickup_longitude","Dropoff_latitude","Dropoff_longitude"]
df=pd.read_csv("demo.csv")
df["src"]=df[ ["Pickup_latitude","Pickup_longitude"] ].apply(loc,axis=1)
df["des"]=df[ ["Dropoff_latitude","Dropoff_longitude"] ].apply(loc,axis=1)
df.to_csv("result.csv",index=False)

I have added a pic of my result.csv

Prune
  • 76,765
  • 14
  • 60
  • 81
Manish Bhanu
  • 53
  • 10
  • 1
    Post the result in the question, not as a link. Also, the link doesn't appear; I think you neglected to put the footnote number into your text. – Prune Jul 26 '17 at 18:12
  • couldn't add image but the line in result.csv is as follows :-73.8441085815 40.7211074829 -73.8163299561 40.7143745422 (40.7211074829, -73.8441085815) (40.7143745422, -73.81632995609999) last two columns are "src","des" – Manish Bhanu Jul 26 '17 at 18:20
  • 1
    Again, post that result in the *question*. – Prune Jul 26 '17 at 18:22
  • 1
    You tried to put your image in the middle of a code block, so it just wound up as `round(x[enter image description here][1][1],14)`. – user2357112 Jul 26 '17 at 18:24
  • You've hit the problem with floats. They aren't exact. They are a representation of a complicated concept, so when you put a value into that representation, it may change that value ever so slightly. I'm not fully up on pandas, but I believe every DataFrame value has a .round() method. Use .round(10) on every value before storing it in the DataFrame you're using to output to the CSV. Or cast them all as strings from the outset since you're not manipulating them in any way. That's probably easier. – Alan Leuthard Jul 26 '17 at 18:32

2 Answers2

1

From what I can tell, -73.81632995609999 is the only number that is off, and it is off by a very small margin. This is a well documented issue that arises because of how Python displays float numbers. From the python website:

On a typical machine running Python, there are 53 bits of precision available for a Python float, so the value stored internally when you enter the decimal number 0.1 is the binary fraction 0.00011001100110011001100110011001100110011001100110011010 which is close to, but not exactly equal to, 1/10.

It’s easy to forget that the stored value is an approximation to the original decimal fraction, because of the way that floats are displayed at the interpreter prompt. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. If Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display

>>> 0.1 0.1000000000000000055511151231257827021181583404541015625

More specifically about small representation errors:

Representation error refers to the fact that some (most, actually) decimal fractions cannot be represented exactly as binary (base 2) fractions. This is the chief reason why Python (or Perl, C, C++, Java, Fortran, and many others) often won’t display the exact decimal number you expect:

>>> 0.1 + 0.2 0.30000000000000004

Why is that? 1/10 and 2/10 are not exactly representable as a binary fraction. Almost all machines today (July 2010) use IEEE-754 floating point arithmetic, and almost all platforms map Python floats to IEEE-754 “double precision”. 754 doubles contain 53 bits of precision, so on input the computer strives to convert 0.1 to the closest fraction it can of the form J/2**N where J is an integer containing exactly 53 bits.

In short, this modification occurs because of how Python stores numbers. You could try multiplying by 10^n and storing these values as integers, and then dividing when you need them for calculations. If you're doing simple calculations, the small difference created by python shouldn't have a substantial impact on those. Hope this helps.

mjmccolgan
  • 27
  • 5
0

This is not specific to Python; it's a problem with accuracy of real (float) numbers. The internal representation in binary is not exact for most fractions; the run-time system gives you the closest approximation it can manage. When that binary number is written out, you get the best decimal representation of the internal binary; the original input is lost when you converted to float.

If you want to maintain exactly the digits you had in the original file, you'll need to perform the necessary formatting on output (see Python output formats for help), or perhaps do all of your processing as strings, without converting to float.

Does that get you moving?

Prune
  • 76,765
  • 14
  • 60
  • 81