0

Sorry, if this question is trivial but I see no solution: I've been using the density() function frequently, always without troubles but now I work with some data set - lets call it tab - with (many) relatively small values and suddenly density(tab) gives something like absolute frequencies - any ideas what I did wrong?

(Note: Also hist(tab, freq = FALSE) gives something weird for tab.)

Remark: summary(tab) gives:

  Min. /   1st Qu.  /   Median   /    Mean  /   3rd Qu.  /     Max. 

-0.0042810  /0.0002679 / 0.0011750 / 0.0071690 / 0.0049510  /0.5839000 

I'd also be very grateful for any general hint, under which circumstances density() gives no relative frequencies as y-values.

Machavity
  • 30,841
  • 27
  • 92
  • 100
chris17
  • 27
  • 9
  • Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Jun 14 '16 at 07:08
  • What is a relative small value? Relative to what? `-1e100`? `1e-100`? :) – Therkel Jun 14 '16 at 07:11
  • Hi! Thanks so much for your answer - if I could reproduce/generalize the phenomenon I would be happy. As indicated, density() usually works fine except for this strange data set - maybe someone has a clue what could have gone wrong. – chris17 Jun 14 '16 at 07:14
  • Oh no, "Only" 1e-10, so r should not have a problem. Still I have mentioned it because thats the only "relevant" info I have from my data set. (once again - I am sorry, that I cannot formulate the question more concise - everything seems so normal in "tab" still density() gives such strange results here) – chris17 Jun 14 '16 at 07:16
  • 1
    What is the output of `summary(tab)`? – Roland Jun 14 '16 at 07:28
  • I don't understand how it is operating differently than usual. If you integrate over the density output, it returns 1 as we would expect from a density; `sum(density(tab)$y[-1]*diff(density(tab)$x))`. A density function can easily take values greater than 1. Perhaps this question can help you: [How can a probability density be greater than one and integrate to one](http://math.stackexchange.com/questions/105455/how-can-a-probability-density-be-greater-than-one-and-integrate-to-one) – Therkel Jun 14 '16 at 08:52

1 Answers1

1

While I can't exactly reproduce your example, it looks to me like you have a huge outlier in your dataset. I.e., your 3rd quartile is 0.005, but the maximum value is 0.584. On the real axis, the distance from your 3rd quartile to your minimum value is 0.01. The distance from the 3rd quartile to the maximum value is over 0.583. That's 58 times farther! Per my understanding density tries to pick a bandwidth that works well across all values. In this case, the bandwidth is likely to be very small, given that most values are clustered together close to 0. In that case you might get a very degenerate density plot, with two vertical lines, one on the left, and one on the right. I was able to generate one such plot using:

plot(density(c(rnorm(100, 0, 0.001), 100)))

All I do is take a sample from a normal distribution, with SD of 0.001, and add an outlier, 100, to this distribution. The density then looks something like this: degenerate density plot[1] The density values sure look like they could be confused for frequencies, but they are not. Of course, if I remove the outlier then the estimated density function gets nicely bell-shaped:

regular density plot

So, it seems likely that you need to remove an outlier from your data.

bogdata
  • 88
  • 4