I want to delete the rows in csv that are not in the index. I have a list of dir and that list name is the column in the csv file.
dirs = os.listdir('foo/')
dirs
['boo', 'aoo', 'coo', 'doo']
I want to delete the rows whose names are not in the dirs list.
file = pd.read_csv('tada.csv')
file.head()
name height weight gender
0 aoo 212 253 M
1 boo 175 243 M
2 coo 190 244 M
3 doo 162 288 F
4 too 222 240 M
I tried this with index:
index = []
idx = []
for dname in dirs:
a = file.index[file['name'] == dname].tolist()
index.append(a)
for i in index:
for j in i:
idx.append(j)
print(idx)
[1, 0, 2, 3]
Then I used df.drop to drop the index but it drops the rows that I want to keep.
for i in idx:
file.drop(i,axis=0,inplace=True)
print(file)
name height weight gender
4 too 222 24 M
5 yoo 272 230 F
6 poo 200 23 F