-1

In R, I am trying to take interview scores currently in a data frame like this:

Interviewer  Applicant  Score
Int_9        App_5      3
Int_3        App_3      2
Int_1        App_2      9
Int_3        App_5      2
...

and reformat to this:

            AVG    Int_1   Int_2   Int_3   ...
App_3       2.0    NA      NA      2                  
App_5       2.5    NA      NA      2
App_2       9.0    9       NA      NA
...

This gets me close:

reshape(data, idvar="applicant", timevar="interviewer", direction="wide")

However, the interviewer names are not sorted alphabetically as I'd like. I suppose I could use the mean and merge functions to get the AVG column, but I don't know how to sort AVG from low to high (and have the rest of the entries come along for the ride). Also, how can I average over all the applicant scores without having to explicitly write the name of each applicant, i.e., how to not have to do this:

app_avg = c(mean(data$score[data$applicant=="App_1"]), mean(data$score[data$applicant=="App_2"]), mean(data$score[data$applicant=="App_3"]), ...)

Help, please?

  • `tapply(dat[,3],dat[2:1],I)??` – Onyambu Feb 05 '18 at 01:53
  • 2
    @akrun - whoops - i meant this one as very close to a duplicate - https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format?noredirect=1&lq=1 - (damn clipboard!) – thelatemail Feb 05 '18 at 02:24

1 Answers1

0

Here is a solution.

library(reshape)
library(tibble)

Recast data from "long" format to "wide" format:

> df1 <- cast(df0, formula = Applicant ~ Interviewer, value = "Score", add.missing = TRUE, fill = NA, fun.aggregate = sum)

  Applicant  Int_1   Int_2   Int_3   ...
1     App_5     NA      NA       2                  
2     App_3     NA      NA       2
3     App_2      9      NA      NA
...

Calculate the mean value of each row:

> df_avg <- data.frame(Applicant = df1[,1], AVG = rowMeans(df1[,-1], na.rm=TRUE))

  Applicant    AVG
1     App_3    2.5
2     App_5    2.0
3     App_2    9.0

Insert the row mean between Applicant and Int_1:

> df2 <- add_column(df1, AVG=df_avg[,2], .before = "Int_1")

  Applicant  AVG  Int_1   Int_2   Int_3   ...
1     App_5  2.5     NA      NA       2                  
2     App_3  2.0     NA      NA       2
3     App_2  9.0      9      NA      NA
...

Reorder rows based on AVG column:

> df3 <- df2[order(df2$AVG),]

  Applicant  AVG  Int_1   Int_2   Int_3   ...
2     App_3  2.0     NA      NA       2                  
1     App_5  2.5     NA      NA       2
3     App_2  9.0      9      NA      NA
...

Renumber row names to be consecutive numbers:

> rownames(df3) <- NULL

  Applicant  AVG  Int_1   Int_2   Int_3   ...
1     App_3  2.0     NA      NA       2                  
2     App_5  2.5     NA      NA       2
3     App_2  9.0      9      NA      NA
...