1

How do i apply a function to a dataframe after grouping it if i only want to apply the function on groups where the group has membership greater than 1.

e.g

 df1 = df.groupyby(['x','y']).count() > 1.apply(f)

 f(x) :
   do something

Secondly what is being passed into the function - is it the elements on the group or the group itself.

CodeGeek123
  • 4,341
  • 8
  • 50
  • 79

1 Answers1

1

I think you need size:

df1 = df.groupby(['x','y']).size()

df1[df1 > 1] =  df1[df1 > 1].apply(f)

What is the difference between size and count in pandas?

Sample:

df = pd.DataFrame({'x':[1,1,3],
                   'y':[5,5,6],
                   'C':[7,8,9]})

print (df)
   C  x  y
0  7  1  5
1  8  1  5
2  9  3  6

def f(x) :
   return x + 2

df1 = df.groupby(['x','y']).size()

s = df1[df1.COUNT > 1].set_index('x')['y']
print (s)
x
1    5
Name: y, dtype: int64

mask = df.set_index('x')['y'].isin(s).values
print (mask)
[ True  True False]

df[mask] = df[mask].apply(f)
print (df)
    C  x  y
0   9  3  7
1  10  3  7
2   9  3  6
Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • in the function i do this : mask = x[K.side1].isin(x[K.side2]) | x[K.s2ide].isin(x[K.side1]). I get this error TypeError: 'int' object is not subscriptable – CodeGeek123 Apr 07 '17 at 09:20
  • Hmm, problem is with `x[K.side1]` - Can you add some date sample ansd explain what need? Or `K.side1` is `dict` and return scalar? Or it is `Series` ? – jezrael Apr 07 '17 at 09:27
  • becuse it need `x['col']`, but it is no problem if it is get dynamically. – jezrael Apr 07 '17 at 09:28
  • so simply need `Series` or `column` of `df` from `x[K.side1]` – jezrael Apr 07 '17 at 09:32
  • so x[K.side1] = x['colname'] basically K.side1 is a column name – CodeGeek123 Apr 07 '17 at 09:56
  • Please check edited answer, I try use your code, but still not 100% sure if it is what you want. If some problem, let me know. Thanks. – jezrael Apr 07 '17 at 10:43
  • Thanks i think the problem is. When i am using size it gets rid of every other column and keeps only the grouped columns and a size value. What i want is the whole of the original dataframe passed into the function groupedby x,y and where the group has more than one element – CodeGeek123 Apr 07 '17 at 10:46
  • Ok, I remove unnecesary code and edit question, please check it. – jezrael Apr 07 '17 at 10:54