0

So I have a datatable that looks something like that following. ID represents an object, bin represents how I am segmenting the data, and percent is how much of a data falls into that bin.

id  bin  percent
2   8   0.20030698388
2   16  0.14504988488
2   24  0.12356101304
2   32  0.09976976208
2   40  0.09056024558
2   48  0.07137375287
2   56  0.04067536454
2   64  0.03914044512
2   72  0.02916346891
2   80  0.16039907904
3   8   0.36316695352
3   16  0.03958691910
3   24  0.11876075731
3   32  0.13253012048
3   40  0.03098106712
3   48  0.07228915662
3   56  0.07745266781
3   64  0.02581755593
3   72  0.02065404475
3   80  0.11876075731

I am looking for a function to turn this dataset into a cdf partitioning id. I have tried cume_dist and percent_rank, but they do not appear to work.

mumair
  • 51
  • 6
  • Not sure which sql you're using, but you just need a cumsum. See e.g. https://stackoverflow.com/questions/2120544/how-to-get-cumulative-sum – IceCreamToucan Jul 10 '18 at 21:28

1 Answers1

0

I am facing a similar problem and found this great tutorial for doing exactly that:

https://dwaincsql.com/2015/05/14/excel-in-t-sql-part-2-the-normal-distribution-norm-dist-density-functions/

It tries to rebuild the Excel function NORM.DIST function which gives you either the PDF if you set the cummulative flag as FALSE and the CDF if you set it as TRUE. I assumed that CUME_DIST would do the exact same thing in SQL. However, it turns out that the latter distributes by counting the elements whereas Excel uses the relative differences in the values.

Thomas L.
  • 524
  • 1
  • 5
  • 17