1

I have a dataframe with two columns A and B. I want to take the scalar of column B based on the value of column A. I used loc and .value [0]

My data volume is relatively small, the main problem is to see whether the syntax of the code is correct. .value seems to be deprecated.

import pandas as pd
import numpy as np

df = pd.DataFrame()
df[['A', 'B']] = pd.DataFrame(np.arange(10).reshape((5, 2)))
df1 = df.loc[df['A'] == 4, 'B'].values[0]
print(df1)

The result is

5

Can this code be optimized?

df1 = df.loc[df['A'] == 4, 'B'].values[0]

numpy is faster:

%timeit df1 = df[df['A'] == 4].B.iloc[0]
723 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df1=df.loc[df['A'] == 4, 'B'].to_numpy()[0]
513 µs ± 4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df1 = df.loc[df['A'] == 4, 'B'].iloc[0]
521 µs ± 20.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
jaried
  • 632
  • 4
  • 15
  • 1
    Instead of `values[0]` you can use `to_numpy()[0]`. That might run a bit faster. – Mayank Porwal Apr 08 '21 at 06:19
  • 1
    not sure if this is faster ``df.set_index('A').at[4,'B']``. Also, your current code can be written as ``df.loc[df['A'] == 4, 'B'].item()``. Again, you may have to test – sammywemmy Apr 08 '21 at 06:23
  • My data volume is relatively small, the main problem is to see whether the syntax of the code is correct. `.value` seems to be deprecated. – jaried Apr 08 '21 at 06:29
  • @jaried Based on your last comment, closing this as a duplicate. Since `to_numpy` vs `values` has already been discussed. – Mayank Porwal Apr 08 '21 at 06:32
  • 1
    @MayankPorwal - Ya, then I reopen. Because I think `.to_numpy()` here is used not good idea (added to answer why) – jezrael Apr 08 '21 at 06:36
  • 1
    @jezrael I think OP just needed to validate his syntax. So `to_numpy` is a perfectly good answer. But again, no point discussing this with you. Thankyou. – Mayank Porwal Apr 08 '21 at 06:39

1 Answers1

1

If you need optimalized with return some value if condition failed use next with iter:

a = next(iter(df.loc[df['A'] == 4, 'B']), 'no match')
print (a)
5

a = next(iter(df.loc[df['A'] == 1000, 'B']), 'no match')
print (a)
no match

If values always matching is possible use Series.to_numpy, but this failed if no match, so better not use:

df.loc[df['A'] == 4, 'B'].to_numpy()[0]
#but this failed
#df.loc[df['A'] == 1000, 'B'].to_numpy()[0]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252