0

I have a data table I am pulling from excel that looks like this:

Raw pH   TAPA
8.20      30
8.21      29
8.22      28.5
8.23      28
8.24      27
8.25      26.5
8.26      26

I usually have no problem looking up a number in one column based on another column. For instance, when, I code:

df = pd.read_excel('Conversions.xlsx', 'Sheet1')
df2=df.loc[df['Raw pH'] == 8.21, 'TAPA']
print(df2)

I would expect it to return "29". However, it is only working when I look for the first pH value of 8.20;

df2=df.loc[df['Raw pH'] == 8.20, 'TAPA']
print(df2)

Returns:

0    30.0
Name: TAPA, dtype: float64

When I try any other number, such as:

df2=df.loc[df['Raw pH'] == 8.23, 'TAPA']
print(df2)

I simply get an empty value:

Series([], Name: TAPA, dtype: float64)

If I try it with iloc:

df2=df.loc[df['Raw pH'] == 8.23, 'TAPA'].iloc[0]
print(df2)

I get the error IndexError: single positional indexer is out-of-bounds.

If I try the iloc() method with 8.2, it returns 30.0 as expected.

Why am I getting back empty values for everything below the first data row?

nathan liang
  • 1,000
  • 2
  • 11
  • 22
Smashley
  • 47
  • 6
  • 2
    Could you please post your dataframe as copyable code instead of an image? Thanks! – nathan liang Mar 24 '22 at 19:06
  • 4
    It might be useful for us to know what `df` looks like. But my first hunch would be that the floats aren't exactly the values you test for. You may want to test for `df.loc[(df['Raw pH'] >= 8.225) & (df['Raw pH'] <= 8.235), 'TAPA']`. – Freek Mar 24 '22 at 19:08
  • I *strongly* believe the problem is that some of your numbers are not in the numerical type you think they are in. They might be read in as strings, so first cast everything to float with `df['Raw pH'] = df['Raw pH'].astype(float)` – nathan liang Mar 24 '22 at 19:12
  • Excel is probably rounding for you and you are not seeing the actual pHs. – ifly6 Mar 24 '22 at 19:12
  • 4
    Because you are dealing with floats. Use `np.isclose` – Scott Boston Mar 24 '22 at 19:15
  • See [this Q/A](https://stackoverflow.com/questions/33626443/comparing-floats-in-a-pandas-column) for reference – mozway Mar 24 '22 at 19:28
  • 1
    @NathanLiang I made the table code so you could copy. I also tried the .astype(float) and I'm having the same issues as before. I added a decimal place in excel to confirm that there are no non-zero digits after the first two decimal places. This is so weird. – Smashley Mar 24 '22 at 20:41
  • Thanks for reformatting! I see that the issue is now with misbehaving floats. You can take a look at what @ScottBoston mentioned—Python floats are rounded very strangely sometimes, and see the post by mozway explaining this. – nathan liang Mar 24 '22 at 20:51
  • maybe you have column with strings and `"8.20" == 8.20` gives `False` – furas Mar 24 '22 at 22:43

1 Answers1

0

I ended up just making the excel worksheet a CSV file instead. It's working fine now. I still don't understand why -- the datatype in excel looks straightforward.

Smashley
  • 47
  • 6
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 25 '22 at 00:11