1

I am new using pandas and I would really appreciate if someone can help me to solve this question.

I have the following dataframe:

import pandas as pd
# sample dataframe
df = pd.DataFrame({
  'Price': [-3000, -262, 150, -3000, -262, 150, -3000, -262, 150, -3000, -262, 0],
  'Volume': [8133, 28287, 19289, 20242, 19428, 28322, 18147, 17234, 12133,18147, 17234, 12133],})

print('Raw dataframe:')
print(df, '\n')

I want to get rid of those observations were the number of times that ¨price¨ appears is not equal to 4. In this example: -3000 and -262 are the only price observations that appear 4 times, therefore I would like just to keep those.

I would be really thankful if someone can help me.

angelavtc
  • 41
  • 1
  • 6

2 Answers2

3

Use GroupBy.transform with size for count per groups to Series with same size like original df, so possible filter by boolean indexing:

df = df[df.groupby('Price')['Price'].transform('size') == 4]

Or use DataFrameGroupBy.filter, but it should be slowier in large DataFrames:

df = df.groupby('Price').filter(lambda x: len(x) == 4)

Or Series.map with Series.value_counts:

df = df[df['file'].map(df['file'].value_counts()) == 4]

print (df)
    Price  Volume
0   -3000    8133
1    -262   28287
3   -3000   20242
4    -262   19428
6   -3000   18147
7    -262   17234
9   -3000   18147
10   -262   17234
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

you can also use Series.value_counts + Series.map:

df[df['Price'].map(df['Price'].value_counts()) == 4)]

    Price  Volume
0   -3000    8133
1    -262   28287
3   -3000   20242
4    -262   19428
6   -3000   18147
7    -262   17234
9   -3000   18147
10   -262   17234
ansev
  • 30,322
  • 5
  • 17
  • 31