Pandas: Deleting rows that are not in specific index

Question

I want to delete the rows in csv that are not in the index. I have a list of dir and that list name is the column in the csv file.

dirs = os.listdir('foo/')
dirs
['boo', 'aoo', 'coo', 'doo']

I want to delete the rows whose names are not in the dirs list.

file = pd.read_csv('tada.csv')
file.head()
    name    height  weight  gender
0   aoo      212    253       M
1   boo      175    243       M
2   coo      190    244       M
3   doo      162    288       F
4   too      222    240       M

I tried this with index:

index = []
idx = []
for dname in dirs:
    a = file.index[file['name'] == dname].tolist()
    index.append(a)

for i in index:
    for j in i:
        idx.append(j)
print(idx)

[1, 0, 2, 3]

Then I used df.drop to drop the index but it drops the rows that I want to keep.

for i in idx:
    file.drop(i,axis=0,inplace=True)
print(file)

  name  height  weight gender
4  too     222      24      M
5  yoo     272     230      F
6  poo     200      23      F

Seems like you are looking for `file = file[file['name'].isin(dirs)]`, no? — ouroboros1, Oct 06 '22 at 12:58
I think it will solve your question `file.drop(file.index[~file.index.isin(idx)], inplace=True)` — Muhammed Erem, Oct 06 '22 at 13:29

Pandas: Deleting rows that are not in specific index

0 Answers0