Python/Pandas - drop observations from a dataset

Question

I am new using pandas and I would really appreciate if someone can help me to solve this question.

I have the following dataframe:

import pandas as pd
# sample dataframe
df = pd.DataFrame({
  'Price': [-3000, -262, 150, -3000, -262, 150, -3000, -262, 150, -3000, -262, 0],
  'Volume': [8133, 28287, 19289, 20242, 19428, 28322, 18147, 17234, 12133,18147, 17234, 12133],})

print('Raw dataframe:')
print(df, '\n')

I want to get rid of those observations were the number of times that ¨price¨ appears is not equal to 4. In this example: -3000 and -262 are the only price observations that appear 4 times, therefore I would like just to keep those.

I would be really thankful if someone can help me.

jezrael · Answer 1 · 2019-10-24T07:57:53.990

Use GroupBy.transform with size for count per groups to Series with same size like original df, so possible filter by boolean indexing:

df = df[df.groupby('Price')['Price'].transform('size') == 4]

Or use DataFrameGroupBy.filter, but it should be slowier in large DataFrames:

df = df.groupby('Price').filter(lambda x: len(x) == 4)

Or Series.map with Series.value_counts:

df = df[df['file'].map(df['file'].value_counts()) == 4]

print (df)
    Price  Volume
0   -3000    8133
1    -262   28287
3   -3000   20242
4    -262   19428
6   -3000   18147
7    -262   17234
9   -3000   18147
10   -262   17234

ansev · Answer 2 · 2019-10-15T11:29:03.713

1

you can also use Series.value_counts + Series.map:

df[df['Price'].map(df['Price'].value_counts()) == 4)]

    Price  Volume
0   -3000    8133
1    -262   28287
3   -3000   20242
4    -262   19428
6   -3000   18147
7    -262   17234
9   -3000   18147
10   -262   17234

edited Oct 15 '19 at 11:29

answered Oct 15 '19 at 11:23

ansev

30,322
5
17
31

Python/Pandas - drop observations from a dataset

2 Answers2