I have a list like this:
list1 = ['4361', '1856', '57586', '79017', '972', '974', '1829', '10787', '85477', '57019', '7431', '53616', '26228', '29085', '5217', '5527']
And then I have two columns of a data frame like this:
print(df['col A'][0:10])
0 6416
1 84665
2 90
3 2624
4 6118
5 375
6 377
7 377
9 351
10 333
print(df['col B'][0:10])
0 2318
1 88
2 2339
3 5371
4 6774
5 23163
6 23647
7 27236
9 10513
10 1600
I want to say 'return only the rows in the data frame, if an item in the list is either in col A or col B of the data frame'.
I could imagine how to do this iteratively, something like this:
for each_item in list1:
for i,row in df.iterrows():
if each_item in row['col A']:
print(row)
if each_item in row['col B']:
print (row)
I'm just wondering if there's a neater way to do it where I don't have to continually loop through the dataframe, as both the list and the dataframe are quite big.
I saw this code snippet online, where this would return the rows where df['col A']
equals a value OR df['col B']
equals a value:
print(df[(df["col A"]==1) | (df_train["col A"]==2)]
I'm just unsure how to convert this to pulling out the data if it's in a list. Can someone show me how to perhaps incorporate this kind of idea into my code, or do people think my original code snippet (using .iterrows()) is the best way?