0

I have this dataframe:

my_df = pd.DataFrame({
    'A': 'a0,a1,a2,a3'.split(','),
    'B': 'b0,b1,b2,b3'.split(','),
    'price': [100, 10, 50, 500]   
})

    A   B   price
0   a0  b0  100
1   a1  b1  10
2   a2  b2  50
3   a3  b3  500

I can use this piece of code to subsample the rows of the top 90 percent prices.

q_90 = my_df['price'].quantile(q=0.9)
my_df[my_df['price'] >= q_90]

    A   B   price
3   a3  b3  500

I am wondering does pandas data frame has any method to do it with higher speed performance directly such as:

my_df.some_method(q=0.9)

    A   B   price
3   a3  b3  500
Amin Ba
  • 1,603
  • 1
  • 13
  • 38
  • 3
    No there isn't such a built-in method because what you want to do is very specific and composed of two more basic and generalizable methods -- calculating a quantile and subsetting with a Boolean mask. But it should be trivial to take what you have and turn it into a function that would have a signature like: `df1 = quantile_subset(my_df, q=0.9)` – ALollz Jun 07 '21 at 15:39

1 Answers1

-1

I did not find what you search for but you may want check this other post: Eliminating all data over a given percentile.

Dharman
  • 30,962
  • 25
  • 85
  • 135