0

I want to write a function that creates a new column with rowmeans for Columns 1-3, only if more than 2 questions for Columns 1-3 per row were answered, otherwise print 'N'.

Here is my dataframe:

test <- data.frame(Manager1 = c(1, 3, 3), Manager2 = c(3, 4, 1), Manager3 = c(NA , 4, 2), Team1 = c(3, 4, 1))

Desired output:

Manager1 Manager2 Manager3 Team1 mean_score
    1       3                3        N
    3       4        4       4     3.66667
    3       1        2       1        2

My code is as follows, but it's not working:

#create function
mean_score <- function(x) {
  for (i in 1:nrow(test)){
    if (sum(test[i, x] != "NA", na.rm = TRUE) >2){
      test$mean_score[i] <- rowMeans(test[i, x], na.rm = TRUE)
    } else 
      test$mean_score[i] <- print("N")
  }
}

#compute function
mean_score(1:3)

What am I missing? Suggestions on better code are welcome too.

3 Answers3

1

You simply can use rowMeans what will return NA if there is one row holding NA what should be here equivalent to only if more than 2 questions for Columns 1-3 per row were answered.

test$mean_score <- rowMeans(test[,1:3])
#  Manager1 Manager2 Manager3 Team1 mean_score
#1        1        3       NA     3         NA
#2        3        4        4     4   3.666667
#3        3        1        2     1   2.000000
GKi
  • 37,245
  • 2
  • 26
  • 48
1

While GKi has a better answer that's more simple and that you should use here is what I changed your code to be so that it works.

Generally when making a function you want to have the input be the dataframe, in this case text and changing the function from there.

Another important thing of note is you probably want to make a vector of values first and then attach said vector to the dataframe as I do in the code below, but you need to make sure you create an empty vector object to do so. R doesn't really let you slowly add cell data to a dataframe, it prefers that a vector (which can be added to) of equal length be joined to it.

Also you don't need to use print() to insert a character into a vector either.

Hope this helps explain why your function was having issues, but frankly GKi's answer is better for general R use!

mean_score <- function(x) {
  mean_score <- vector()
  for (i in 1:nrow(x)){
    if (sum(x[i,] != "NA", na.rm = TRUE) >3){
      mean_score[i] <- rowMeans(x[i,], na.rm = TRUE)
    } else 
      mean_score[i] <- "N"
  }
  x$mean_score <- mean_score
  return(x)
}

mean_score(test)
Tim N
  • 23
  • 5
  • Is that in regards to the redundancy of the function name and the vector name `mean_score`? Guess that's not best practice, but the code worked for me so I'm not sure what you're pointing out? – Tim N May 04 '20 at 16:28
  • Sorry my mistake I ran the code and it failed I realise I was reading the wrong `test`. Will delete my comment. – Peter May 04 '20 at 16:31
1

I think it is not ideal to put a character together with a numeric value, since it will convert the whole column into character. However, if that is what you want:

my_sum <- function(x,min=2){
  s <- mean(x, na.rm = T) # get the mean
  no_na <- sum(!is.na(x)) # count the number of non NAs
  if(no_na>min){s}else{"N"} # return mean if enough non NAs
}
test$mean <- apply(test[,1:3],1,my_sum)

test

  Manager1 Manager2 Manager3 Team1             mean
1        1        3       NA     3                N
2        3        4        4     4 3.66666666666667
3        3        1        2     1                2

str(test)
'data.frame':   3 obs. of  5 variables:
 $ Manager1: num  1 3 3
 $ Manager2: num  3 4 1
 $ Manager3: num  NA 4 2
 $ Team1   : num  3 4 1
 $ mean    : chr  "N" "3.66666666666667" "2"
desval
  • 2,345
  • 2
  • 16
  • 23
  • Thanks for pointing out that it is best to avoid characters with numeric in the same column! If we were to use numeric values, would it just be else {NA}? Also, would you recommend apply over for loops? – firefly3224 May 04 '20 at 14:46
  • 1
    yes, you can just add NA instead of "N". I think the apply version is less cumbersome and easier to read. There are plenty of posts about loops vs the apply family on this website. Here you find one https://stackoverflow.com/questions/42393658/lapply-vs-for-loop-performance-r – desval May 04 '20 at 15:08