I have two sets of data:
Dataset 1:
-4.96600134256044
-4.78340374913002
-4.93136896680689
-4.80958108060998
-4.78688287192542
-4.9431452930913
-4.93676628405869
-4.87328189586985
-4.91867843591513
-4.72101863119006
-4.95749167305945
-4.79202404641664
-4.91265785779198
-4.94596580589554
-4.96595222256787
-4.7990191635208
-4.97194852291884
-4.78515347272161
-4.78340374913002
-4.8994168374135
-4.97206198058066
-4.95252689510477
-4.93963055552644
-4.95490836707013
-4.94133564424905
-4.78567577865158
-4.93963055552644
-4.93131563559386
-4.9710618452962
-4.90015209439797
-4.9665194453887
-4.93403567225855
-4.91165041153205
-4.85009602823937
-4.78340374913002
-4.77292978439906
-4.94782851444531
-4.64848347534667
-4.91165041153205
-4.82937702765807
-4.96202809430577
-4.7983814963622
-4.93198889539142
-4.97072129594592
-4.88775205449138
-4.96917754667146
-4.972240408012
-4.96062137229138
-4.84390165131993
-4.93630849353535
-4.92623245728544
-4.91859094033325
-4.89568644535618
-4.87243553740634
-4.76982873302833
-4.8953404941385
-4.94451830002783
-4.88104841757604
-4.80303414573805
-4.88705883246573
-4.96499558513462
-4.56610914869673
-4.96928985131163
-4.80780803677881
-4.9556234540787
-4.84808934356167
-4.72319662154655
-4.9575854510567
-4.96960730728536
-4.9056755790436
-4.94039653820335
-4.53920246550341
-4.97211181130125
-4.86213634700864
-4.96802952189005
-4.9717135485154
-4.82056508210921
-4.96777645971916
-4.94038569046493
-4.95173085290477
-4.83470303172871
-4.91551379314551
-4.93963055552644
-4.97211181086369
-4.807583383435
-4.97216236251657
-4.97232745985347
-4.91551379314551
-4.94522084426514
-4.89719997383376
-4.96071975048121
-4.93464863469402
-4.88775205449138
-4.91638381844513
-4.80256598250479
-4.79828215315771
-4.73688107699373
-4.88114134915641
-4.92310502488463
Dataset 2:
-4.96600134256044
-4.78340374913002
-4.93136896680689
-4.80958108060998
-4.78688287192542
-4.9431452930913
-4.93676628405869
-4.87328189586985
-4.91867843591513
-4.72101863119006
-4.95749167305945
-4.79202404641664
-4.91265785779198
-4.94596580589554
-4.96595222256787
-4.7990191635208
-4.97194852291884
-4.78515347272161
-4.78340374913002
-4.8994168374135
-4.97206198058066
-4.95252689510477
-4.93963055552644
-4.95490836707013
-4.94133564424905
-4.78567577865158
-4.93963055552644
-4.93131563559386
-4.9710618452962
-4.90015209439797
-4.9665194453887
-4.93403567225855
-4.91165041153205
-4.85009602823937
-4.78340374913002
-4.77292978439906
-4.94782851444531
-4.64848347534667
-4.91165041153205
-4.82937702765807
-4.96202809430577
-4.7983814963622
-4.93198889539142
-4.97072129594592
-4.88775205449138
-4.96917754667146
-4.972240408012
-4.96062137229138
-4.84390165131993
-4.93630849353535
-4.92623245728544
-4.91859094033325
-4.89568644535618
-4.87243553740634
-4.76982873302833
-4.8953404941385
-4.94451830002783
-4.88104841757604
-4.80303414573805
-4.88705883246573
-4.96499558513462
-4.56610914869673
-4.96928985131163
-4.80780803677881
-4.9556234540787
-4.84808934356167
-4.72319662154655
-4.9575854510567
-4.96960730728536
-4.9056755790436
-4.94039653820335
-4.53920246550341
-4.97211181130125
-4.86213634700864
-4.96802952189005
-4.9717135485154
-4.82056508210921
-4.96777645971916
-4.94038569046493
-4.95173085290477
-4.83470303172871
-4.91551379314551
-4.93963055552644
-4.97211181086369
-4.807583383435
-4.97216236251657
-4.97232745985347
-4.91551379314551
-4.94522084426514
-4.89719997383376
-4.96071975048121
-4.93464863469402
-4.88775205449138
-4.91638381844513
-4.80256598250479
-4.79828215315771
-4.73688107699373
-4.88114134915641
-4.92310502488463
And I'm trying to plot them as histograms and then measure the overlap between the histograms as a percentage of the total area of the histograms. I tried using the method suggested in this post, but that gave me an answer larger than 1--which I didn't think would be possible.
My code looks like this:
rng = min(dataset1.min(),dataset2.min()),max(dataset1.max(),dataset2.max())
n1, bins1,_= plt.hist(dataset1,color = color1,alpha = 0.75,bins=7,weights =(np.ones_like(dataset1)/len(dataset1)),range=rng)
n1_area = sum(np.diff(bins1)*n1)
n2, bins2,_ = plt.hist(dataset2,color = color2,alpha = 0.75,bins = 7,weights =(np.ones_like(dataset2)/len(dataset2)),range=rng)
n2_area = sum(np.diff(bins2)*n2)
overlap = np.minimum(n1,n2)
overlap_area = overlap.sum()
overlap_percentage=overlap_area/(n1_area+n2_area)
Anyone have any idea why I'm getting a percentage that's over 1, and how to fix it so that I get the correct value?