Estimating the Similarity between Two Unpaired Datasets

Question

I'm trying to compare the data (black) and the model (color). [Fig. 1]

There is another example [Fig. 2]. The data set and the model are different for Fig. 1 and Fig. 2.

In both the cases, it appears that there are overlaps between the model and data, however, the overlap/matching is better for Fig. 2. I would like to quantify the correlation of the data and the model for both the cases in order to distinguish between the "goodness of fit" of both the figures. I was wondering which (statistical) method I should use to estimate the correlation quantitatively.

What is the meaning of the plots, though. For each black dot, there is a specific corresponding color dot? — Ami Tavory, Feb 02 '16 at 13:52
@AmiTavory No, there is no one-to-one relation between the data and the model. For each of the figures, the data represents the values of x and y for different conditions. I tried to model the various possibilities and represent my results with colored points. Now I'm trying to see if the model points [color] are good enough to represent the data [black]. — rana, Feb 02 '16 at 14:04
@rana can you elaborate what the plots show? what is modeled to what? and please don't say that the black points are modeled to the colored ones — Zachi Shtain, Feb 02 '16 at 14:06
@ZachiShtain I tried to explain the plot in my above comment. To be more precise, my data is a 2d array (x,y) of length m and my model is a 2d array (x,y) of length n. I'm interested to quantify the "overlap" between the data and the model as shown in the scatter plot. — rana, Feb 02 '16 at 14:18
@rana, it seems you are trying to compare between two datasets. Have you considered using the center of gravity and the disparity matrices for the comparison? — Zachi Shtain, Feb 02 '16 at 14:23
@ZachiShtain it will be helpful if you kindly explain the methods. thanks! — rana, Feb 02 '16 at 16:29

score 0 · Answer 1 · answered Feb 03 '16 at 09:48

You could start by calculating the center of gravity for each dataset using numpy.mean and compare how close their are to one another. Next step is to check the if each center is inside the uncertainty ellipse (http://www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix/) of the other dataset.

Finally, I would recommend to using hypotheses testing like student's test or f-test. There are some methods in scipy for these kind of test, just look at the documentation

Estimating the Similarity between Two Unpaired Datasets

1 Answers1