0

I have used one python code in PyCharm in Linux and the format of number was -91.35357. When I used the same code in PyCharm in Windows format was -91.35356999999999. The problem is that value is consisted in the file name which I need to open (and the list of files to open is long).

Anyone knows possible explanation and hot to fix it?

  • 6
    It's a really bad idea to have floating point numbers in file names. If you want a consistent representation, be explicit about how many decimal places should be displayed. – jonrsharpe Apr 26 '18 at 12:13
  • We can't do anything unless you show us the code. [mcve] – Mark Dickinson Apr 26 '18 at 12:25
  • @jonrshape: It was necessary to have floating point in file name because it refers to the latitude and longitude of data location. Also, it is not the same number of decimals in latitude or longitude for each location. – Gordan Mimic Apr 26 '18 at 12:39

2 Answers2

0

Your PyCharm on Linux is simply rounding of your large floating point number. Rounding it off to the nearest 6 or 7 can resolve your issue but DONT USE THESE AS FILE NAMES.

Keeping your code constant in both cases then, their can be many explanations:

1) 32-bit Processors handles float differently than 64-Bit Processors.

2) PyCharm for both Linux and Windows behaves differently for floating points which we cannot determine exactly, may be PyCharm for Windows is better optimised.

edit 1

Explanation for Point 1

on 32-Bit processors everything is really done in 80-bit precision internally. The precision really just determines how many of those bits are stored in memory. This is part of the reason why different optimisation settings can change results slightly: They change the amount of rounding from 80-bit to 32- or 64-bit.

edit 2

You can use hashmapping for saving your data in files and then mapping them onto the co-ordinates. Example:

# variable = {(long,lat):"<random_file_name>"}
cordinates_and_file ={(-92.45453534,-87.2123123):"AxdwaWAsdAwdz"}
NoorJafri
  • 1,787
  • 16
  • 27
  • @NoorAliJafri: It was necessary to have floating point in file name because it refers to the latitude and longitude of data location. Also, it is not the same number of decimals in latitude or longitude for each location. However, both in Linux and Windows format number for latitudes and longitudes is float64 . – Gordan Mimic Apr 27 '18 at 09:34
  • I would recommend you using hash mapping for certain problem, for example: cordinates_and_file ={(-92.45453534,-87.2123123):"AxdwaWAsdAwdz"} – NoorJafri Apr 27 '18 at 09:45
  • I think that for the same reason explained, you should not use floats as hashing keys. – jjmontes Apr 30 '18 at 11:55
  • But it's better than like seriously keeping the whole file names as per your float result. :D I know its not the proper way. – NoorJafri Apr 30 '18 at 12:09
0

Floats

Always remember that float numbers have a limited precision. If you think about it, there must be a limit to how exactly you represent a number if you limit storage to 32 or 64 bits (or any other number).

in Python

Python provides just one float type. Float numbers are usually implemented using 64 bits, but yet they might be 64 bit in one Python binary, 32 bit on another, so you can't really rely on that (however, see @Mark Dickinson comment below).

Let's test this. But note that, because Python does not provide float32 and float64 alternatives, we will use a different library, numpy, to provide us with those types and operations:

>>> n = 1.23456789012345678901234567890 
>>> n
1.2345678901234567  
>>> numpy.float64(n)
1.2345678901234567
>>> numpy.float32(n)
1.2345679

Here we can see that Python, in my computer, handles the variable as a float64. This already truncates the number we introduced (because a float64 can only handle so much precision).

When we use a float32, precision is further reduced and, because of truncation, the closest number we can represent is slightly different.

Conclusion

Float resolution is limited. Furthermore, some operations behave differently across different architectures.

Even if you are using a consistent float size, not all numbers can be represented, and operations will accumulate truncation errors.

Comparing a float to another float shall be done considering a possible error margin. Do not use float_a == float_b, instead use abs(float_a - float_b) < error_margin.

Relying on float representations is always a bad idea. Python sometimes uses scientific notation:

>>> a = 0.0000000001
>>> str(a)
'1e-10'

You can get consistent rounding approximation (ie, to use in file names), but remember that storage and representation are different things. This other thread may assist you: Limiting floats to two decimal points

In general, I'd advise against using float numbers in file names or as any other kind of identifier.

Latitude / Longitude

float32 numbers have not enough precision to represent the 5th and 6th decimal numbers in latitude/longitude pairs (depending on whether the integer part has one, two or three digits).

If you want to learn what's really happening, check this page and test some of your numbers: https://www.h-schmidt.net/FloatConverter/IEEE754.html

Representing

Note that Python rounds float values when representing them:

>>> lat = 123.456789
>>> "{0:.6f}".format(lat)
'123.456789'
>>> "{0:.5f}".format(lat)
'123.45679'

And as stated above, latitude/longitude cannot be correctly represented by a float32 down to the 6th decimal, and furthermore, the truncated float values are rounded when presented by Python:

>>> lat = 123.456789
>>> lat
123.456789
>>> "{0:.5f}".format(numpy.float64(lat))
'123.45679'
>>> "{0:.5f}".format(numpy.float32(lat))
'123.45679'
>>> "{0:.6f}".format(numpy.float32(lat))
'123.456787'

As you can see, the rounded version of that float32 number fails to match the original number from the 5th decimal. But also does the rounded version to the 5th decimal of the float64 number.

jjmontes
  • 24,679
  • 4
  • 39
  • 51
  • 1
    "float numbers are implemented using the default architecture word size." This is false. Both 32-bit machines and 64-bit machines use the C double. On the vast majority of machines (32-bit _or_ 64-bit), that's the IEEE 754 binary64 type. – Mark Dickinson Apr 26 '18 at 16:43
  • @jjmontes: Thanks for the explanation, numpy.float32(n) could help – Gordan Mimic Apr 27 '18 at 08:18