9

I have a DataFrame I created by doing the following manipulations to a .fits file:

data_dict= dict()
for obj in sortedpab:
    for key in ['FIELD', 'ID',  'RA' , 'DEC' , 'Z_50', 'Z_84','Z_16' , 'PAB_FLUX', 'PAB_FLUX_ERR']:
        data_dict.setdefault(key, list()).append(obj[key])

gooddf = pd.DataFrame(data_dict)
gooddf['Z_ERR']= ((gooddf['Z_84'] - gooddf['Z_50']) + (gooddf['Z_50'] - gooddf['Z_16'])) / (2 * 
gooddf['Z_50'])
gooddf['OBS_PAB'] = 12820 * (1 + gooddf['Z_50'])
gooddf.loc[gooddf['FIELD'] == "ERS" , 'FIELD'] = "ERSPRIME"
gooddf = gooddf[['FIELD' , 'ID' , 'RA' , 'DEC' , 'Z_50' , 'Z_ERR' , 'PAB_FLUX' , 'PAB_FLUX_ERR' , 
'OBS_PAB']]
gooddf = gooddf[gooddf.OBS_PAB <= 16500]

Which gives me a DataFrame with 351 rows and 9 columns. I would like to keep rows only according to certain indices, and I thought for example doing something of this sort:

indices = [5 , 6 , 9 , 10]
gooddf = gooddf[gooddf.index == indices]

where I would like it to keep only the rows with the index values listed in the array indices, but this is giving me issues.

I found a way to do this with a for loop:

good = np.array([5 , 6 , 9 , 12 , 14 , 15 , 18 , 21 , 24 , 29 , 30 , 35 , 36 , 37 , 46 , 48 ])

gooddf50 = pd.DataFrame()
for i in range(len(good)):
    gooddf50 = gooddf50.append(gooddf[gooddf.index == good[i]])

Any thoughts on how to do this in a better way, preferably using just pandas?

Nikko Cleri
  • 185
  • 1
  • 1
  • 11

1 Answers1

23

This will do the trick:

gooddf.loc[indices]

An important note: .iloc and .loc are doing slightly different things, which is why you may be getting unexpected results.

You can read deeper into the details of indexing here, but the key thing to understand is that .iloc returns rows according to the positions specified, whereas .loc returns rows according to the index labels specified. So if your indices aren't sorted, .loc and .iloc will behave differently.

Carolyn Conway
  • 1,356
  • 1
  • 15
  • 21