0

I have a dataset with number of startstation IDS, endstation IDS and the duration of travel for bikes in a city. The data dates back to 2017 and hence now certain stations do not exist. I have the list of those station IDs. How can I remove rows from the dataframe which either starts or ends at those stations?

For example, if I want to remove StartStation ID = 135 which is in index 4 and 5, what should I do? This entends for a million rows where 135 can be present anywhere.

Bike   Id      StartStation Id  EndStation Id   Duration    
0      395     573              137.0           660.0   
1      12931   399              507.0           420.0   
2      7120    399              507.0           420.0
3      1198    599              616.0           300.0   
4      10739   135              486.0           1260.0  
5      10949   135              486.0           1260.0  
6      8831    193              411.0           540.0   
7      8778    266              770.0           600.0   
8      700     137              294.0           540.0   
9      5017    456              39.0            3000.0  
10     4359    444              445.0           240.0   
11     2801    288              288.0           5340.0  
12     9525    265              592.0           300.0   
salixor
  • 112
  • 1
  • 6
  • 2
    Possible duplicate of [How to implement 'in' and 'not in' for Pandas dataframe](https://stackoverflow.com/questions/19960077/how-to-implement-in-and-not-in-for-pandas-dataframe) – jose_bacoy May 29 '19 at 17:39

1 Answers1

0

I'm calling your list of ids to remove removed_ids.

df=df.loc[
    (~df['StartStation ID'].isin(removed_ids)) &\
    (~df['EndStation ID'].isin(removed_ids))
]
iamchoosinganame
  • 1,090
  • 6
  • 15