multivariate density calculations in R

Question

I have a data frame of many numeric variables. Is there a way of calculating (not plotting) areas of the global density which are less dense than others? In other words, is there a way of locating areas of the hyperspace which are very sparsely populated with data points?

score 0 · Answer 1 · answered Apr 10 '18 at 13:35

Assuming that your dataframe looks like this

df <- data.frame(x = c(rnorm(100,0,3),rnorm(100,12,1),rnorm(100,20,3)), 
                 y = c(rnorm(75,5,2),rnorm(75,-5,3),rnorm(140,10,2),rnorm(10,25,10)))

You can store each density in a vector

dsx <- density(df$x)
dsy <- density(df$y)

Now look at the result of dsx for instance. You will see that we get a list which contains:

dsx$x coordinates where density is evaluated
dsx$y the estimated density at those coordinates

If you want to find coordinates of areas sparsely populated, you just need to retrieve the coordinates corresponding to low densities.

dsx$x[which(dsx$y) < 0.03] # returns coordinates for which density(x) < 0.03

To combine all your coordinates (here x and y), I would create a dataframe with coordinates and their densities and filter it based on the values of densities.

df_ds <- data.frame(dsx$x, dsy$x, dsx$y, dsy$y)
df_ds[which((df_ds$dsx.y < 0.03) & (df_ds$dsy.y < 0.01)), c("dsx.x","dsy.x")]

By default, you will get 512 values of density per coordinate. You may want to increase this step by setting n in density. Be sure to set the same value on each of your coordinate.

dsx <- density(df$x, n=2048)

this is a good option for 1-2D cases, but I need a multivariate estimation, where the multivariate distribution is unknown — Omry Atia, Apr 10 '18 at 13:57

multivariate density calculations in R

1 Answers1