1

I have a point cloud of 6 millions x, y and z points I need to process. I need to look for specific points within this 6 millions xyz points and I have using pandas df.isin() function to do it. I first save the 6 millions points into a pandas dataframe (save under the name point_cloud) and for the specific point I need to look for into a dateframe as well (save under the name specific_point). I only have two specific point I need to look out for. So the output of the df.isin() function should show 2 True value but it is showing 3 instead.

In order to prove that 3 True values are wrong. I actually iterate through the 6 millions point clouds looking for the two specific points using iterrows(). The result was indeed 2 True value. So why is df.isin() showing 3 instead of the correct result of 2?

I have tried this, which result true_count to be 3

label = (point_cloud['x'].isin(specific_point['x']) & point_cloud['y'].isin(specific_point['y']) & point_cloud['z'].isin(specific_point['z'])).astype(int).to_frame()
true_count = 0

for index, t_f in label.iterrows():
     if int(t_f.values) == int(1):
          true_count += 1

print(true_count)

I have tried this as well, also resulting in true_count to be 3.

for t_f in (point_cloud['x'].isin(specific_point['x']) & point_cloud['y'].isin(specific_point['y']) & point_cloud['z'].isin(specific_point['z'])).values
true_count = 0

     if t_f == True:
          true_count += 1

Lastly I tried the most inefficient way of iterating through the 6 millions points using iterrows() but this result the correct value for true_count which is 2.

true_count = 0

for index_sp, sp in specific_point.iterrows():
     for index_pc, pc in point_cloud.iterrows():

          if sp['x'] == pc['x'] and sp['y'] == pc['y'] and sp['z] == pc['z]:
               true_count += 1

print(true_count)

Do anyone know why is df.isin() behaving this way? Or have I seem to overlook something?

Denzel
  • 358
  • 5
  • 19
  • [Provide a copy of the DataFrame](https://stackoverflow.com/questions/52413246/how-do-i-provide-a-reproducible-copy-of-my-existing-dataframe) – Trenton McKinney Aug 11 '19 at 18:29

1 Answers1

0

isin function for multiple columns with and will fail to look the dataframe per row, it is more like check the product the list in dataframe .

So what you can do is

checked=point_cloud.merge(specific_point,on=['x','y','z'],how='inner')

For example, if you have two list l1=[1,2];l2=[3,4], using isin , it will return any row match [1,3],[1,4],[2,3],[2,4]

BENY
  • 317,841
  • 20
  • 164
  • 234
  • Hello WeNYoBen, the `df.merge` function will sift out the specific points from the 6 millions points. But what I need to do is to look for these specific points and make changes to it. e.g. changing the RGB values of these points. I am trying to look for the points among the 6 millions points and `if True` make changes to these points. Thanks. – Denzel Aug 12 '19 at 05:34