I was going through the a scipy code for ks test (2 sample) which calculates the maximum distance between CDF's of any two given samples. code for calculating the cumulative Distribution Function(CDF)
.
I fail to understand the logic in the lines for calculating cdf. First, data1
and data2
is sorted and then using np.searchsorted
we are trying to find the position of data_all
in both data1
and data2
. data_all
is nothing but concatenation of sorted data1
and data2
.
What if, the min value of data2
is below data1
. Doesn't that violate the assumption that cdf
shouldn't be decreasing with value
data_all = np.concatenate([data1,data2])
cdf1 = np.searchsorted(data1,data_all,side='right')/(1.0*n1)
cdf2 = (np.searchsorted(data2,data_all,side='right'))/(1.0*n2)