I have a data frame of many numeric variables. Is there a way of calculating (not plotting) areas of the global density which are less dense than others? In other words, is there a way of locating areas of the hyperspace which are very sparsely populated with data points?
Asked
Active
Viewed 127 times
1 Answers
0
Assuming that your dataframe looks like this
df <- data.frame(x = c(rnorm(100,0,3),rnorm(100,12,1),rnorm(100,20,3)),
y = c(rnorm(75,5,2),rnorm(75,-5,3),rnorm(140,10,2),rnorm(10,25,10)))
You can store each density in a vector
dsx <- density(df$x)
dsy <- density(df$y)
Now look at the result of dsx
for instance. You will see that we get a list which contains:
dsx$x
coordinates where density is evaluateddsx$y
the estimated density at those coordinates
If you want to find coordinates of areas sparsely populated, you just need to retrieve the coordinates corresponding to low densities.
dsx$x[which(dsx$y) < 0.03] # returns coordinates for which density(x) < 0.03
To combine all your coordinates (here x
and y
), I would create a dataframe with coordinates and their densities and filter it based on the values of densities.
df_ds <- data.frame(dsx$x, dsy$x, dsx$y, dsy$y)
df_ds[which((df_ds$dsx.y < 0.03) & (df_ds$dsy.y < 0.01)), c("dsx.x","dsy.x")]
By default, you will get 512
values of density
per coordinate. You may want to increase this step by setting n
in density
. Be sure to set the same value on each of your coordinate.
dsx <- density(df$x, n=2048)

AshOfFire
- 676
- 5
- 15
-
this is a good option for 1-2D cases, but I need a multivariate estimation, where the multivariate distribution is unknown – Omry Atia Apr 10 '18 at 13:57