0

I have a dataset with over 1000 levels occupation types. I'd like to make a for loop to subset and calculate the average prestige for each occupations.

Below is an example of small part of the dataset "Prestige":

   education income women  prestige census type
   13.11     12351  11.16  68.8     1113   prof
   12.26     25879  4.02   69.1     1130   wc
   12.77     9271   15.70  63.4     1171   bc
   11.42     8865   9.11   56.8     1175   prof
   14.62     8403   11.68  73.5     2111   chemist

I have started the for loop as follow :

  for (row in 1:nrow(Prestige)) {
   type  <- Prestige[row, "type"]
    if(type == "prof") {
    meanProf <- mean(subset(Prestige,type=="prof")$prestige)}
    else if(type=="bc") { 
    meanBc <- mean(subset(Prestige,type=="bc")$prestige)}
    else if (type=="wc") { 
    meanWc <- mean(subset(Prestige,type=="wc")$prestige)}
    else if (type=="unknown"){
    meanUnknown <- mean(subset(Prestige,type=="chemist")$prestige)
} } 

Since I have to create subsets for 1000 different levels of occupation (type) and calculate the mean of "prestige" for each subset my code does not work.

Is there a way to create a loop which would create subsets with the different levels and calculate the average "prestige" for each subset?

  • 1
    Do you really want a for loop for that? You can use `dplyr` package to group by `type` values and calculate various statistics (like the average) for each `type` value. Have a look here: http://genomicsclass.github.io/book/pages/dplyr_tutorial.html especially in the last section (function `group_by`) – AntoniosK Jan 22 '18 at 15:36
  • 1
    Simply use `by`: `output_list <- by(Prestige, Prestige$type, FUN=function(df) mean(df$prestige))` – Parfait Jan 22 '18 at 15:42
  • 1
    If you *don't* want the result in a data frame, `with(Prestige, tapply(prestige, type, mean))` is easiest, but there are many other options at the duplicate. – Gregor Thomas Jan 22 '18 at 15:43
  • @Parfait thank you this is a good alternative ! – Lorenna Van Munnecom Jan 22 '18 at 15:53
  • @AntoniosK is there way to group by type values using a for loop? – Lorenna Van Munnecom Jan 22 '18 at 15:54
  • 2
    There is, but at each iteration you have to (a) select a subset of your dataset based on that specific `type` value, (b) calculate average, or anything else you want, (c) save results. Why not using less code and a package which was created for these type of processes? – AntoniosK Jan 22 '18 at 16:00
  • @AntoniosK thank you for advice – Lorenna Van Munnecom Jan 22 '18 at 16:01

0 Answers0