2

Ok, I have a pandas dataframe like this:

         lat      long    level        date    time    value
3341  29.232   -15.652     10.0  20100109.0   700.0      0.5
3342  27.887   -13.668    120.0  20100109.0   700.0      3.2
...
3899  26.345   -11.234      0.0  20100109.0   700.0      5.8

The reason of the strange number of the index is because it comes from a csv converted to pandas dataframe with some values filtered. Columns level, date, time are not really relevant.

I am trying, in ipython, to see the some rows filtering by latitude, so I do (if the dataframe is c):

c[c['lat'] == 26.345]

or

c.loc[c['lat'] == 26.345]

and I can see if the value is present or not, but sometimes it outputs nothing for latitude values that I am seeing in the dataframe !?! (For instance, I can see in the dataframe the value of latitude 27.702 and when I do c[c['lat'] == 27.702] or c.loc[c['lat'] == 27.702] I get an empty dataframe and I am seeing the value for such latitude). What is happening here?

Thank you.

David
  • 1,155
  • 1
  • 13
  • 35
  • 1
    Are all of the `lat` values numeric? Some of them are probably strings, so when you check equality against a numeric, it won't return anything. Check the type of the column. If it's `object`, that's what's going on. You can use the [to_numeric()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_numeric.html) function from pandas to convert it. – 3novak Nov 24 '16 at 15:56
  • No, I am afraid that I have done `pd.to_numeric` and it's the same... – David Nov 24 '16 at 16:02

3 Answers3

4

This is probably because you are asking for an exact match against floating point values, which is very, very dangerous. They are approximations, often printed to less precision than actually stored.

It's very easy to see 0.735471 printed, say, and think that's all there is, when in fact the value is really 0.73547122072282867; the display function has simply truncated the result. But when you try a strict equality test on the attractively short value, boom. Doesn't work.

Instead of

c[c['lat'] == 26.345]

Try:

import numpy as np

c[np.isclose(c['lat'], 26.345)]

Now you'll get values that are within a certain range of the value you specified. You can set the tolerance.

Jonathan Eunice
  • 21,653
  • 6
  • 75
  • 77
2

It is a bit difficult to give a precise answer, as the question does not contain reproducible example, but let me try. Most probably, this is due floating point issues. It is possible that the number you see (and try to compare with) is not the same number that is stored in the memory due to rounding. For example:

import numpy as np
x = 0.1
arr = np.array([x + x + x])
print(np.array([x + x + x]))
# [ 0.3]
print(arr[arr == 0.3])
# []
print(x + x + x)
# 0.30000000000000004
# in fact 0.1 is not exactly equal to 1/10, 
# so 0.1 + 0.1 + 0.1 is not equal to 0.3

You can overcome this issue using np.isclose instead of ==:

print(np.isclose(arr, 0.3))
# [ True]
print(arr[np.isclose(arr, 0.3)])
# [ 0.3]
Community
  • 1
  • 1
Ilya V. Schurov
  • 7,687
  • 2
  • 40
  • 78
  • @Jonathan The problem was with the float quantity. I have never had a float situation before and it didn't occur to me that the representation of a float could be the key. Thanxs. – David Nov 24 '16 at 16:17
  • If you want to give a comment to @Jonothan, it is better to comment on his answer, I believe :) (It's on top now.) – Ilya V. Schurov Nov 24 '16 at 16:21
  • 1
    Thanks, Ilya, but in the help section for comments it is written that if you mention @Jonathan and he is in the previous comment he also receives the comment ;) – David Nov 24 '16 at 16:23
2

In addition to the answers addressing comparison on floating point values, some of the values in your lat column may be string type instead of numeric.

EDIT: You indicated that this is not the problem, but I'll leave this response here in case it helps someone else. :)

Use the to_numeric() function from pandas to convert them to numeric.

import pandas as pd

df['lat'] = pd.to_numeric(df['lat'])
# you can adjust the errors parameter as you need
df['lat'] = pd.to_numeric(df['lat'], errors='coerce')
3novak
  • 2,506
  • 1
  • 17
  • 28