0

I've got data that looks like a univariate distribution, but it is represented by two variables: x being the independent variable, and y being proportional to its probability density (it would be if the area under the curve was 1). The y values are continuous. How can I transform this data set into a single variable so I can fit the distribution?

The best I could come up with was:

library(fGarch)

# simulate data from skewed distributions
i <- 10000
x <- rsnorm(i, xi = 0.5)

x.fit <- snormFit(x)

x.dens <- density(x, bw = "SJ", n = length(x))

# generate integral reps
k <- 1e6
times <- floor(x.dens$y*k)

# upsample
out <- rep(x.dens$x, times)

# downsample
x.smp <- sample(x = out, size = length(x))

# show that it seemed to work
plot(x = x.dens$x, y = x.dens$y, type = "l")
lines(density(x.smp, bw = "SJ"), col = "red")

current way I do this

Is there a smarter way to do this that doesn't lead to an extreme number of ties? I suppose I could fit some kind of a smoothing function to the "densities".

wdkrnls
  • 4,548
  • 7
  • 36
  • 64
  • 3
    Can you please include data and/or code that will provide us with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? If y is really a probability density (i.e. you have lost the original sampling information), then you might not be able to do a proper *statistical* fit, although you can still find parameters that give a distribution "close" to the observations in some sense. – Ben Bolker May 10 '16 at 17:01
  • Take a look at the fitdistrplus package. The included vignette is a good reference on distribution fitting. – Dave2e May 10 '16 at 18:39

0 Answers0