0

I am currently working on this data -
Data_A of 10,000 samples each with 170 features
Data_B of 1,000 samples each with same 170 features

If we plot Data_A on a 170-dimensional space then it will cover some space. So, I just want to know what percent of my samples in Data_B belongs to that space. I need not to visualize anything, I just a subset.

(Actually, in my Data_B, I have added 800 samples which are similar to samples in Data_A and 200 samples which are quite different from samples in Data_A)

I have tried OneClassSVM but it not giving good results, moreover its results totally depend on its parameters(nu, gamma, kernel etc). And I have to tune models like this every time I have a new set of my training and testing data, which I don't want to do.

Is there any other easy technique or model to perform this in python? Any module of Python that ca perform this using set theory?

Pardon me if I am not able to explain the problem statement correctly.

Blessy
  • 19
  • 5
  • Are you using `pandas`? – Anirudh Sridhar Jun 15 '17 at 11:13
  • Yes, I am using pandas for reading those data sets( Data_A and Data_B). – Blessy Jun 15 '17 at 11:24
  • Maybe [this](https://stackoverflow.com/questions/17095101/outputting-difference-in-two-pandas-dataframes-side-by-side-highlighting-the-d) can help. – Anirudh Sridhar Jun 15 '17 at 11:32
  • Thanks @Anirudh, but the document you shared deals with the difference in values of particular features of samples, and what I am searching is the intersection of space occupied by two different data frames. – Blessy Jun 16 '17 at 10:09

0 Answers0