how to subsample top 90 percent of a dataframe based on the price column?

Question

I have this dataframe:

my_df = pd.DataFrame({
    'A': 'a0,a1,a2,a3'.split(','),
    'B': 'b0,b1,b2,b3'.split(','),
    'price': [100, 10, 50, 500]   
})

    A   B   price
0   a0  b0  100
1   a1  b1  10
2   a2  b2  50
3   a3  b3  500

I can use this piece of code to subsample the rows of the top 90 percent prices.

q_90 = my_df['price'].quantile(q=0.9)
my_df[my_df['price'] >= q_90]

    A   B   price
3   a3  b3  500

I am wondering does pandas data frame has any method to do it with higher speed performance directly such as:

my_df.some_method(q=0.9)

    A   B   price
3   a3  b3  500

No there isn't such a built-in method because what you want to do is very specific and composed of two more basic and generalizable methods -- calculating a quantile and subsetting with a Boolean mask. But it should be trivial to take what you have and turn it into a function that would have a signature like: `df1 = quantile_subset(my_df, q=0.9)` — ALollz, Jun 07 '21 at 15:39

score -1 · Answer 1 · edited Jun 07 '21 at 15:49

-1

I did not find what you search for but you may want check this other post: Eliminating all data over a given percentile.

edited Jun 07 '21 at 15:49

Dharman

30,962
25
85
135

answered Jun 07 '21 at 15:44

Corentin

1

how to subsample top 90 percent of a dataframe based on the price column?

1 Answers1