How to select some dataframe rows that contain an array of specific values in one of the columns?

Question

I have the following dataset:

       import numpy as np
       array_id = np.array([2,4,7])

I have the following array with ids:

       df = pd.DataFrame({'Name': ['Station', 'Sensor', 'Station', 'Sensor', 
                                   'Sensor', 'Sensor', 'Sensor'], 
                          'Type': ['analog', 'dig', 'analog', 'analog', 
                                   'analog', 'analog', 'dig'],
                          'id': [1, 2, 3, 4, 5, 6, 7]})

I would like to select the columns of the dataframe (df) where the id belongs to the array of ids (array_id). I would like the output to be:

             Name   Type    id
           Sensor   dig     2
           Sensor   analog  4
           Sensor   dig     7

I managed to implement code to do this operation, but I needed to use two for():

      d = {'Name', 'Type', 'id'}

      df_aux = pd.DataFrame(d)
      df_select = pd.DataFrame(d)

      for i in range(0, len(df)):    
          for j in range(0, len(array_id)):
    
              if(df['id'].iloc[i] == array_id[j]):
    
                  array_aux = [(df['Name'].iloc[i], 
                                df['Type'].iloc[i], 
                                df['id'].iloc[i])]        
    
                  df_aux = pd.DataFrame(array_aux, columns = ['Name', 'Type', 'id'])
   
                  df_select = pd.concat([df_select, df_aux])

The output is:

      print(df_select)

        0     Name      Type    id
       id     NaN       NaN     NaN
       Type   NaN       NaN     NaN
       Name   NaN       NaN     NaN
        NaN   Sensor    dig     2.0
        NaN   Sensor    analog  4.0
        NaN   Sensor    dig     7.0

I would like to learn a way that does not need to use the two for() and that the output of (df_select) does not appear with the NaN. Is there a way to solve this?

`df[df["id"].isin(array_id)]`? – not_speshal Nov 17 '21 at 19:46 — not_speshal, Nov 17 '21 at 19:46

score 3 · Answer 1 · answered Nov 17 '21 at 19:49

3

Use the isin method of a Series.

df.loc[df['id'].isin(array_id), :]

answered Nov 17 '21 at 19:49

Kapocsi

922
6
17

How to select some dataframe rows that contain an array of specific values in one of the columns?

1 Answers1