0

I have a dataframe and I want to get all rows grouped in id where after row with country = russia and month = march is followed by a line with country != russia

input dataframe:

data = {'fruit': ['pear','cucumber','cherry', 'apricot', 'pear','watermelon','pear','banana', 'pear', 'cherry','apple', 'melon', 'cherry','banana', 'kiwi', 'guava', 'banana'],
'country': ['france','russia', 'usa','russia', 'франция','russia','usa', 'russia', 'russia','ghana','russia', 'russia', 'albania','andorra', 'russia', 'russia', 'russia'],
'id': ['01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5'],
'month': ['january','september','january','january', 'september','march','march', 'november', 'march', 'january','january', 'march', 'january','july', 'march', 'march', 'april']        
}
df = pd.DataFrame(data, columns = ['fruit','country', 'id', 'month'])

input dataframe image

I though the below should work, but it doesn't take into account month = march and i get incorrect result.

Can anyone see the problem

df.groupby("id")
   .filter(
       lambda x: x.loc[(x["country"].eq("russia") & x["month"].eq("march")).idxmax() + 1:, ["country"]]
                  .fillna("russia")
                  .ne("russia")
                  .any()))

output dataframe:

data = {'fruit': ['watermelon','pear','banana', 'pear', 'cherry','apple', 'melon', 'cherry','banana'],
'country': ['russia','usa', 'russia', 'russia','ghana','russia', 'russia', 'albania','andorra'],
'id': ['03','03','011', '011', '011','011', '6', '6','6'],
'month': ['march','march', 'november', 'march', 'january','january', 'march', 'january','july']        
}
df = pd.DataFrame(data, columns = ['fruit','country', 'id', 'month'])

output dataframe image

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • Welcome to Stack Overflow! Your question is currently unclear. You have not told us what your input dataframe looks like, what your expected/desired output is, and how that differs from the result you are getting. To get good help from the community, please provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). Additional information on providing reproducible pandas examples can be found [here](https://stackoverflow.com/questions/20109391). – AlexK May 13 '21 at 22:00
  • @AlexK thank you, i've just added necessary data –  May 13 '21 at 22:13

1 Answers1

0

IIUC try:

import pandas as pd

data = {'fruit': ['pear', 'cucumber', 'cherry', 'apricot', 'pear', 'watermelon',
                  'pear', 'banana', 'pear', 'cherry', 'apple', 'melon',
                  'cherry', 'banana', 'kiwi', 'guava', 'banana'],
        'country': ['france', 'russia', 'usa', 'russia', 'франция', 'russia',
                    'usa', 'russia', 'russia', 'ghana', 'russia', 'russia',
                    'albania', 'andorra', 'russia', 'russia', 'russia'],
        'id': ['01', '01', '01', '02', '02', '03', '03', '011', '011', '011',
               '011', '6', '6', '6', '5', '5', '5'],
        'month': ['january', 'september', 'january', 'january', 'september',
                  'march', 'march', 'november', 'march', 'january', 'january',
                  'march', 'january', 'july', 'march', 'march', 'april']
        }
df = pd.DataFrame(data, columns=['fruit', 'country', 'id', 'month'])

# Groupby ID then include groups that have
# at least 1 row that meet all conditions
out = df.groupby('id').filter(
    lambda g: (
            g['country'].eq('russia') & 
            g['month'].eq('march') &
            g['country'].shift(-1).ne('russia')
    ).any()
).reset_index(drop=True)

print(out)

out:

        fruit  country   id     month
0  watermelon   russia   03     march
1        pear      usa   03     march
2      banana   russia  011  november
3        pear   russia  011     march
4      cherry    ghana  011   january
5       apple   russia  011   january
6       melon   russia    6     march
7      cherry  albania    6   january
8      banana  andorra    6      july
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57