How do I delete rows in a pandas dataframe based on array of row indexes

Question

I have a data frame with points. The first two columns are positions. I am filtering the data based on a points proximity to another point. I calculate the distance of all the points with cdist and then filter this result to find the indices of the points that have a distance less 0.5 from each other. I also have to do two mini filters on these indices first to remove remove indices for comparing the same point distance [n,n] = distance [n,n] will always equal zero and I don't want to remove all of my points. Also I remove indeces for similar distance comparisons distance [n,m] = distance [m,n]. There are basically double the number of points that I need to remove so I use unique to filter out half.

My questions loc_find is a numpy array of indexes to rows that should be removed. How do I remove use this array to remove these numbered rows from my pandas dataframe without iterating over the dataframe?

from scipy.spatial.distance import cdist
import numpy as np
import pandas as pd
# make points and calculate distances
east=data['easting'].values
north=data['northing'].values
points=np.vstack((east,north)).T
distances=cdist(points,points) # big row x row matrix
zzzz=np.where(distances<0.5)

loc_dist=np.vstack((zzzz[0],zzzz[1])).T  #array of indices where points are
# to close together and will be filtered contains unwanted distance 
# comparisons such as comparing data[1,1] with data[1,1] which is always zero
#since it is the same point. also distance [1,2] is same as [2,1]

#My code for filtering the indices
loc_dist=loc_dist.astype('int') 
diff_loc=zzzz[0]-zzzz[1] # remove indices for comparing the same 
                         #point distance [n,n] = distance [n,n]
diff_zero=np.where(diff_loc==0)
loc_dist_s=np.delete(loc_dist, diff_zero[0],axis=0)
loc_find=np.unique(loc_dist_s) # remove indices for similar distance
                               #comparisons distance [n,m] = distance [m,n]

@EdChum Thanks for the suggestion about loc_find this lead me to another question on Github that has the answer. — Michael Wallace, May 10 '16 at 19:33

score 0 · Answer 1 · edited May 23 '17 at 10:33

Thanks to @EdChum I found these two answered questions which work for me.

A faster alternative to Pandas `isin` function

Select rows from a DataFrame based on values in a column in pandas

Just needed to convert dataframe indexes to a column with

data.loc[:,'rindex1']=data.index.get_values()

and then to remove the rows use the following

data_df2=data.loc[~data['rindex1'].isin(loc_find)]

Hope this helps someone else.

How do I delete rows in a pandas dataframe based on array of row indexes

1 Answers1