-1

I have a dataframe that lists studentnumber <- c( 1,2,3.. nth) and schoolnumber<- c(1,1,2,3,4,4) so pupil 1 is in school 1, pupil 2 is in school 1, pupil 3 is in school 3....

I have social economic status for each pupil and I want to calculate a new column where the SESs are actual SES minus the mean SES of a particular school. The function for this is apparently:

mydata$meansocialeconomicstatus <- with(mydata, tapply(ses, schoolnumber, mean))

But I receive an error term because the new column is not repeating each value depending on if the school number has repeated. So this gives me a discrepancy in the number of rows in the new column not matching the dataframe. This is because each mean is only being given once.

My question is, what could I add to make the mean ses repeat in the new column depending on the school number?

Farshid Shekari
  • 2,391
  • 4
  • 27
  • 47
Rachel
  • 47
  • 5

1 Answers1

1

You can use the dplyr package.

library(dplyr)

# Calculate the mean socialeconomicstatus per schoolnumber.
mydata2 <- mydata %>% 
            group_by(schoolnumber) %>%
            summarise(meansocialeconomicstatus = mean(ses))

# Join the mean socialeconomicstatus back to the original dataset based on schoolnumber.
left_join(mydata,mydata2,by="schoolnumber")
Jonas Tundo
  • 6,137
  • 2
  • 35
  • 45
  • I have the same problem with this where it wont join them together as one has alot less levels than the other. – Rachel Apr 18 '15 at 11:22
  • Did you try the exact same thing as written above? The dataframes don't have to be of equal size. E.g. run `left_join(iris,iris[1,],by="Species")`. If you want more specific help post a reproducible example.http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Jonas Tundo Apr 18 '15 at 11:27
  • Double checked how I coded it - I'd miscopied the left_join - now it works! Brilliant. Thank you! – Rachel Apr 18 '15 at 11:45