-1

I have a data set with 4 variables, one of these variables is a dummy stating whether the individual graduated from a particular program (exits). I need to create a loop that will, for each of the 3 variables create two new variables (mean for dummy = 1 and mean for dummy = 0). This is my code, I want to make it more efficient, since afterwards I want to create a new data.frame for exits == 0 and substract both!.

 summary_means_1 = bf %>%
 filter(exits == 1) %>% 
 summarise(
 v1_1 = as.double(mean(bf$v25_grad, na.rm = TRUE)),
 v2_1 = as.double(mean(bf$v29_read, na.rm = TRUE)),
 v3_1 = as.double(mean(bf$v30_math, na.rm = TRUE))
 )
  • 5
    This will be easier to answer with some [example data and expected output](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – neilfws Feb 10 '19 at 22:36
  • 3
    This is unclear. Please clarify and give a reproducible example which illustrates the core problem. – John Coleman Feb 10 '19 at 22:36
  • 2
    https://stackoverflow.com/questions/11952706/generate-a-dummy-variable might have what you need – JustGettinStarted Feb 10 '19 at 22:36
  • Using your new code you don't need the df$ in the summarise, and you can `group_by` instead of filtering which will give you the means for both 0 and 1 at the same time. See my answer for how it will look – morgan121 Feb 10 '19 at 23:22

1 Answers1

-1

You can do this with the plyr package:

Say this is your data (simplified):

df <- data.frame(Dummy=sample(0:1, 10, T), V1=rnorm(10, 10), V2=rpois(10, 0.5))

This code will calculate the mean of each column, split by dummy:

library(magrittr)
library(plyr)
df %>% 
   group_by(Dummy) %>% 
   summarise(Mean_V1=mean(V1, na.rm = T), 
             Mean_V2=mean(V2, na.rm = T))

You'll need to add a new row in the summarise section for each column.

Using base R you can use colMeans with subsetted data:

colMeans(df[df$Dummy==0, -1])
colMeans(df[df$Dummy==1, -1])

Or you could combine them like this:

data.frame(Col=c("V1", "V2"), 
           Mean_0=colMeans(df[df$Dummy==0, -1]), 
           Mean_1=colMeans(df[df$Dummy==1, -1]))
morgan121
  • 2,213
  • 1
  • 15
  • 33