1

I am trying to make split violin plot with ggplot2, something like this. I have found a very good code but I am not able to use it because when I try to create pdat it is empty and I do not know why it happen. Following I attach a summary of my data and what I am doing, as well as, the result. Can anyone help me please?

My data summary:

summary(object = my_data)
      Sample    Treatment         VAR1            VAR2      
 Sample_1:500   Cond1:1000   Min.   :36.00   Min.   :21.13  
 Sample_2:500   Cond2:1000   1st Qu.:90.00   1st Qu.:36.92  
 Sample_3:500                Median :90.00   Median :38.11  
 Sample_4:500                Mean   :88.91   Mean   :37.53  
                             3rd Qu.:90.00   3rd Qu.:38.90  
                             Max.   :90.00   Max.   :40.60  


dput(head(my_data, 20))
structure(list(Sample = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Sample_1", 
"Sample_2", "Sample_3", "Sample_4"), class = "factor"), Treatment = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("Cond1", "Cond2"), class = "factor"), 
    VAR1 = c(90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 
    90, 90, 90, 90, 90, 90, 90, 90), VAR2 = c(34.8888888888889, 
    38.2333333333333, 32.8333333333333, 37.7111111111111, 38.4111111111111, 
    36.7222222222222, 34.5555555555556, 35.7666666666667, 37.7111111111111, 
    37.3777777777778, 36.4888888888889, 37.8222222222222, 35.4777777777778, 
    34.0333333333333, 37.1222222222222, 39.0555555555556, 38.5666666666667, 
    34.8555555555556, 38.6, 34.6555555555556)), .Names = c("Sample", 
"Treatment", "VAR1", "VAR2"), row.names = c(NA, 20L), class = "data.frame")

What I am doing:

library(dplyr)
pdat <- my_data %>%
  group_by(Sample, Treatment) %>%
  do(data.frame(loc = density(.$VAR2)$Sample,
                dens = density(.$VAR2)$VAR2))

Result:

summary(object = pdat)
      Sample  Treatment
 Sample_1:0   Cond1:0  
 Sample_2:0   Cond2:0  
 Sample_3:0            
 Sample_4:0 

1 Answers1

1

Function density returns columns x for location and and y for corresponding density. You refer to unexisting columns $Sample and $VAR2.

pdat <- data %>%
  group_by(Sample, Treatment) %>%
  do(data.frame(loc = density(.$VAR2)$x,
                dens = density(.$VAR2)$y))
Lstat
  • 1,450
  • 1
  • 12
  • 18