0

I want to select some rows with multiple conditions. I would like to even if one of the conditions was true then that row was selected.

enter image description here

def obtain(x):
    mask = (x['EucDistPoint'] >= x['EucDistPoint'].mean()) | (x['CRS'] >= 
            x['CRS'].mean()) | (x['CRC'] >= x['CRC'].mean())
    selected = x.loc[mask]
    return selected
selected = data.groupby('MMSI').apply(obtain)

I want the output row to have at least one of the conditions but in output, I have the rows that haven't any of these conditions.

I've applied :

def obtain(x):
    mask = (x.EucDistPoint >= x.EucDistPoint.mean()) |\
        (x.CRS >= x.CRS.mean()) | (x.CRC >= x.CRC.mean())
    return x[mask]
selected = data.groupby('MMSI').apply(obtain) 

but when I want to check the output I use this:

selected[selected['MMSI']==210161000].min()

but the output is like this:

MMSI                        210161000
BaseDateTime      2017-02-01 08:54:35
LAT                           34.2080
LON                         -125.9994
SOG                            1.1000
COG                         -194.3000
CRS                            0.0000
CRC                            0.0000
X                         230030.4090
Y                        3789274.2135
EucDistPoint                   0.0000
HaverDistPoint                 0.0000
dtype: object

and this is wrong because the minimum of CRS and CRC and EucDistPoint are 0.0022, 0.0446 and 551.887

Danial
  • 7
  • 4
  • 1
    Can you add some example data, 5-10 rows. For example `print(x.head(10))` and add that to your question – Erfan Aug 05 '19 at 18:08
  • 1
    Possible duplicate of [*Select rows from a DataFrame based on values in a column in pandas*](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas) – Alexandre B. Aug 05 '19 at 18:12
  • Possible duplicate of [Select rows from a DataFrame based on values in a column in pandas](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas) – Jake P Aug 05 '19 at 18:12
  • I did . now you can see @Erfan – Danial Aug 05 '19 at 18:20
  • Please, do not post data as image. Have a look at [*How to make good reproducible pandas examples*](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Alexandre B. Aug 05 '19 at 18:23
  • Its in your jupyter notebook. Simply do `print(selected)` and post that output in your question. – Erfan Aug 05 '19 at 18:29
  • And what is the reason you use `groupby` here? – Erfan Aug 05 '19 at 18:31

1 Answers1

1

Your code works "as is". You can also write it a bit shorter:

def obtain(x):
    mask = (x.EucDistPoint >= x.EucDistPoint.mean()) |\
        (x.CRS >= x.CRS.mean()) | (x.CRC >= x.CRC.mean())
    return x[mask]
data.groupby('MMSI').apply(obtain)

Example

My source DataFrame:

        MMSI  CRS     CRC  EucDistPoint
0  210161100  1.0  1.0000           0.0
1  210161100  0.0  0.0281         200.0
2  210161100  0.0  0.0530         589.1
3  210161200  1.0  1.0000           0.0
4  210161200  0.0  0.0281         500.0
5  210161200  0.0  0.0530         200.1

Mean values (data.groupby('MMSI').mean()):

                CRS       CRC  EucDistPoint
MMSI                                       
210161100  0.333333  0.360367    263.033333
210161200  0.333333  0.360367    233.366667

Conditions for particular columns (df.groupby('MMSI').transform(lambda x: x >= x.mean())):

             CRS    CRC  EucDistPoint
MMSI                                 
210161100   True   True         False
210161100  False  False         False
210161100  False  False          True
210161200   True   True         False
210161200  False  False          True
210161200  False  False         False

As you can see, row No 1 and 5 have False in all 3 columns (row numbers from 0), so they should not be in the output.

And the result of either your or my function:

                  MMSI  CRS     CRC  EucDistPoint
MMSI                                             
210161100 0  210161100  1.0  1.0000           0.0
          2  210161100  0.0  0.0530         589.1
210161200 3  210161200  1.0  1.0000           0.0
          4  210161200  0.0  0.0281         500.0

Just as it should be.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • Thank you for your reply. your output is the same that I want. I added my output to my question. can you please tell me why my output and your output are different? – Danial Aug 06 '19 at 07:11
  • My output is correct. I found out where is my mistake. – Danial Aug 06 '19 at 07:53