4

Is there a function that could be used to fit a frequency distribution in R? I'm aware of fitdistr but as far as I can tell it only works for data vectors (random samples). Also, I know that converting between the two formats is trivial but frequencies are so large that memory is a concern.

For example, fitdistr may be used the following way:

x<-rpois(100, lambda=10)
fitdistr(x,"poisson")

Is there a function that would do the same fitting on a frequency table? Something along the lines:

freqt <- as.data.frame(table(x))
fitfreqtable(freqt$x, weights=freqt$Freq, "poisson")

Thanks!

  • 2
    Can you give an example of your non-vector data that has these problems? – gung - Reinstate Monica Jun 23 '13 at 17:31
  • @gung, thank you for the quick reply. You're right, the question is only related to R so my apologies for posting off-topic. I'm flagging it as recommended. –  Jun 23 '13 at 20:32
  • No problem, @FlorinCoras. In the interim, would you mind editing your Q to give an example? When you get to SO, people will want to know. – gung - Reinstate Monica Jun 23 '13 at 20:35
  • I take it that reconstructing the original data is a non-option here? `y <- rep(freqt$x, freqt$Freq); fitdistr(y, "poisson")` – Dason Jun 23 '13 at 22:34
  • @Dason, I'd like to avoid it since frequencies may add up to billions. – Florin Coras Jun 23 '13 at 22:57
  • Are you just interested in the Poisson? Or are other distributions of interest as well? – Dason Jun 23 '13 at 23:30
  • If you just want the Poisson, you can maximize the likelihood quite directly; algebraically - the parameter estimate is just the mean, readily computed from the table, and indeed the variance of the estimator is quite straightforward as well. – Glen_b Jun 24 '13 at 00:19
  • @Dason and @Glen_b, I used Poisson just as an example. I'm looking for something as general as `fitdistr`. Thanks for the quick replies. – Florin Coras Jun 24 '13 at 06:38

3 Answers3

5

There's no built-in function that I know of for fitting a distribution to a frequency table. Note that, in theory, a continuous distribution is inappropriate for a table, since the data is discrete. Of course, for large enough N and a fine enough grid, this can be ignored.

You can build your own model-fitting function using optim or any other optimizer, if you know the density that you're interested in. I did this here for a gamma distribution (which was a bad assumption for that particular dataset, but never mind that).

Code reproduced below.

negll <- function(par, x, y)
{
    shape <- par[1]
    rate <- par[2]
    mu <- dgamma(x, shape, rate) * sum(y)
    -2 * sum(dpois(y, mu, log=TRUE))
}


optim(c(1, 1), negll, x=seq_along(g$count), y=g$count, method="L-BFGS-B", lower=c(.001, .001))
$par
[1] 0.73034879 0.00698288

$value
[1] 62983.18

$counts
function gradient 
      32       32 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Community
  • 1
  • 1
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • thanks for your answer. I was hoping to avoid building my own model fitting functions but, as you mention, it seems there's no curve fitting procedure that works similarly to `fitdistr`. – Florin Coras Jun 24 '13 at 08:42
0

For fitting a Poisson distribution, you only need the mean of your sample. Then the mean equals the lambda, which is the only parameter of the Poisson distribution. Example:

set.seed(1111)
sample<-rpois(n=10000,l=10)
mean(sample)
[1] 10.0191

which is almost equal to the lambda value put for creating the sample (l=10). The small difference (0.0191) is due to the randomness of the Poisson distribution random value generator. As you increase n the difference will get smaller. Alternatively, you can fit the distribution using an optimization method:

library(fitdistrplus)
fitdist(sample,"pois")
set.seed(1111)

Fitting of the distribution ' pois ' by maximum likelihood 
Parameters:
       estimate Std. Error
lambda  10.0191 0.03165296

but it's only a waste of time. For theoritical information on fitting frequency data, you can see my answer here.

Community
  • 1
  • 1
ntzortzis
  • 41
  • 4
0

The function fixtmixturegrouped from the package ForestFit does the job for other distribution models using frequency-by-group data.

It can fit simple or mixture distribution models based on "gamma", "log-normal", "skew-normal", and "weibull".

For a Poisson distribution, the population mean is the only parameter that is needed. Applying a simple summary function on your data would suffice (as suggested by ntzortzis)

tim
  • 3,559
  • 1
  • 33
  • 46