5

I import a csv data file into a pandas DataFrame df with pd.read_csv. The text file contains a column with strings like these:

y
0.001
0.0003
0.0001
3e-05
1e-05
1e-06

If I print the DataFrame, pandas outputs the decimal representation of these values with 6 digits after the comma, and everything looks good.

When I try to select rows by value, like here:

df[df['y'] == value],

by typing the corresponding decimal representation of value, pandas correctly matches certain values (example: rows 0, 2, 4) but does not match others (rows 1, 3, 5). This is of course due to the fact that those rows values do not have a perfect representation in base two.

I was able to workaround this problem is this way:

df[abs(df['y']/value-1) <= 0.0001]

but it seems somewhat awkward. What I'm wondering is: numpy already has a method, .isclose, that is specifically for this purpose.

Is there a way to use .isclose in a case like this? Or a more direct solution in pandas?

Community
  • 1
  • 1
germ
  • 1,477
  • 1
  • 18
  • 18

2 Answers2

6

Yes, you can use numpy's isclose

df[np.isclose(df['y'], value)]
Mike Graham
  • 73,987
  • 14
  • 101
  • 130
0

You can convert values to int, floating point might not equal.
df.loc[df["sum"].astype(int) == int(value)]

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 09 '23 at 17:35