Intersection of n-dimensional data sets in python

Question

I am currently working on this data -
Data_A of 10,000 samples each with 170 features
Data_B of 1,000 samples each with same 170 features

If we plot Data_A on a 170-dimensional space then it will cover some space. So, I just want to know what percent of my samples in Data_B belongs to that space. I need not to visualize anything, I just a subset.

(Actually, in my Data_B, I have added 800 samples which are similar to samples in Data_A and 200 samples which are quite different from samples in Data_A)

I have tried OneClassSVM but it not giving good results, moreover its results totally depend on its parameters(nu, gamma, kernel etc). And I have to tune models like this every time I have a new set of my training and testing data, which I don't want to do.

Is there any other easy technique or model to perform this in python? Any module of Python that ca perform this using set theory?

Pardon me if I am not able to explain the problem statement correctly.

Yes, I am using pandas for reading those data sets( Data_A and Data_B). — Blessy, Jun 15 '17 at 11:24
Maybe [this](https://stackoverflow.com/questions/17095101/outputting-difference-in-two-pandas-dataframes-side-by-side-highlighting-the-d) can help. — Anirudh Sridhar, Jun 15 '17 at 11:32
Thanks @Anirudh, but the document you shared deals with the difference in values of particular features of samples, and what I am searching is the intersection of space occupied by two different data frames. — Blessy, Jun 16 '17 at 10:09

Intersection of n-dimensional data sets in python

0 Answers0