Can .apply use information from other groups?

Question

For each element in a group determine if it is present in the next group (in order as these groups appear - not necessarily numerical). For the last group - all False.

Example:

df = pd.DataFrame({'group': [ 0,   1,   1,   0,   2 ], 
                     'val': ['a', 'b', 'a', 'c', 'c']})
grouped = df.groupby('group')

print(result)
0     True
1    False
2    False
3    False
4    False
Name: val, dtype: bool

What is the best way to do it? I can accomplish it like this, but it seems too hacky:

keys = list(grouped.groups.keys())

iterator_keys = iter(keys[1:])
def f(ser):
    if ser.name == keys[-1]:
        return ser.isin([])
    next_key = next(iterator_keys)
    return ser.isin(grouped.get_group(next_key)['val'])
result = grouped['val'].apply(f)

by **next** group, do you mean **next value** in group (0->1)? or next in order? (here also 0->1 but could be different) — mozway, Aug 16 '22 at 19:35

score 3 · Accepted Answer · edited Aug 17 '22 at 00:27

3

Try:

g = df.groupby("group")

m = g["val"].agg(set).shift(-1, fill_value=set())
x = g["val"].transform(lambda x: x.isin(m[x.name]))
print(x)

Prints:

0     True
1    False
2    False
3    False
4    False
Name: val, dtype: bool

Note:

If you want to replace values of the last group with any values (not necessarily with False), you can do this:

m = g["val"].agg(set).shift(-1)
x = g["val"].transform(lambda x: x.isin(m[x.name])
                                 if not pd.isnull(m[x.name])
                                 else values)

For example, if you set values = True, the x will be:

0     True
1    False
2    False
3    False
4     True
Name: val, dtype: bool

edited Aug 17 '22 at 00:27

Vladimir Fokow

3,728
2
5
27

answered Aug 16 '22 at 19:33

Andrej Kesely

168,389
15
48
91

2

I thought the question was specific to accessing the group inside `apply` ;) Otherwise I'd also have used this approach! +1 – mozway Aug 16 '22 at 19:43
@mozway, Can it be done? This answer solves my problem, but it'd also like to develop my general understanding. Now I feel that functions that act on a grouped `df` are not meant to use other groups at all, and in general we must think how to solve each such problem individually. Would you agree? – Vladimir Fokow Aug 16 '22 at 20:37
1

@VladimirFokow correct, they are not meant to, but you can cheat. You already have a good idea how to in your question ;) – mozway Aug 16 '22 at 20:40

Can .apply use information from other groups?

1 Answers1

Linked