0

This makes no sense to me and is rather frustrating

I have:

dt[, .N, aggregate][order(-N)]

                  aggregate      N
    1:          0.000000000 459725
    2:          1.000000000  47072
    3:                   NA  33858
    4:          0.500000000  11391
    5:          0.005952381   7001
   ---                            
33439:          0.208722741      1
33440:          0.599567100      1
33441:          0.717073171      1
33442:          0.169515670      1
33443:          0.077205882      1
hist(dt$aggregate, main = "Distribution of aggregated rate, method 2", xlab = "aggregate rate", ylim = c(0,1000000))

But the histogram plotted makes 0.0 out to be >600k in frequency.

Is there some quirk that I do not know about?

I am not sure how I can show the plot here.

  • 1
    Looks like your `by=` group here is `aggregate`, which is a numeric value. So you will get the count for each exact value of `aggregate`, `0.000`, `0.001` etc in the data.table summary. `hist`ograms tend to give the count of values within a range like `0 - <0.5`, `>=0.5 - <1` etc. – thelatemail May 06 '21 at 06:00
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Note that histograms bin data so the values for 0 are just likely grouped with values near zero, You don't get a count per unique number. – MrFlick May 06 '21 at 06:00
  • @thelatemail this was it, thank you – Jantje Houten May 06 '21 at 06:13

0 Answers0