How to keep only a certain set of rows by index in a pandas DataFrame

Question

I have a DataFrame I created by doing the following manipulations to a .fits file:

data_dict= dict()
for obj in sortedpab:
    for key in ['FIELD', 'ID',  'RA' , 'DEC' , 'Z_50', 'Z_84','Z_16' , 'PAB_FLUX', 'PAB_FLUX_ERR']:
        data_dict.setdefault(key, list()).append(obj[key])

gooddf = pd.DataFrame(data_dict)
gooddf['Z_ERR']= ((gooddf['Z_84'] - gooddf['Z_50']) + (gooddf['Z_50'] - gooddf['Z_16'])) / (2 * 
gooddf['Z_50'])
gooddf['OBS_PAB'] = 12820 * (1 + gooddf['Z_50'])
gooddf.loc[gooddf['FIELD'] == "ERS" , 'FIELD'] = "ERSPRIME"
gooddf = gooddf[['FIELD' , 'ID' , 'RA' , 'DEC' , 'Z_50' , 'Z_ERR' , 'PAB_FLUX' , 'PAB_FLUX_ERR' , 
'OBS_PAB']]
gooddf = gooddf[gooddf.OBS_PAB <= 16500]

Which gives me a DataFrame with 351 rows and 9 columns. I would like to keep rows only according to certain indices, and I thought for example doing something of this sort:

indices = [5 , 6 , 9 , 10]
gooddf = gooddf[gooddf.index == indices]

where I would like it to keep only the rows with the index values listed in the array indices, but this is giving me issues.

I found a way to do this with a for loop:

good = np.array([5 , 6 , 9 , 12 , 14 , 15 , 18 , 21 , 24 , 29 , 30 , 35 , 36 , 37 , 46 , 48 ])

gooddf50 = pd.DataFrame()
for i in range(len(good)):
    gooddf50 = gooddf50.append(gooddf[gooddf.index == good[i]])

Any thoughts on how to do this in a better way, preferably using just pandas?

@ALollz I am not sure what this is doing, but it returns indices 10, 11, 15 when I use `good = np.array([5 , 6 , 9])` and then `gooddf.iloc[good]` — Nikko Cleri, Oct 22 '19 at 11:59

score 23 · Accepted Answer · answered Oct 25 '19 at 01:09

This will do the trick:

gooddf.loc[indices]

An important note: .iloc and .loc are doing slightly different things, which is why you may be getting unexpected results.

You can read deeper into the details of indexing here, but the key thing to understand is that .iloc returns rows according to the positions specified, whereas .loc returns rows according to the index labels specified. So if your indices aren't sorted, .loc and .iloc will behave differently.

How to keep only a certain set of rows by index in a pandas DataFrame

1 Answers1

Linked