1

I'm using the following code to compute the mean of several columns in my data:

df %>% rowwise() %>% mutate(avg=mean(Responsiveness:Translation, na.rm=TRUE))

I keep getting the error:

Error: NA/NaN argument

I know that some of my data has N/A values, but why doesn't na.rm=TRUE deal with them?

K-P
  • 79
  • 2
  • 5
  • 1
    Please provide some example data and expected output. [This](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be helpful for that. – akrun Aug 14 '15 at 14:55
  • The error might be because you have a character 'N/A' value instead of real `NA`. It would also change the column class from numeric to 'character' or 'factor' depending upon whether you used `stringsAsFactors=FALSE` or not in the `read.csv`. When you read the dataset with `read.csv/read.table`, you can specify `na.strings='N/A'` so that it will be read as real NAs and the columns will be numeric. – akrun Aug 14 '15 at 15:38

1 Answers1

1

Here is one option using rowMeans within the dplyr. We select the columns from 'Responsiveness' to (:) 'Translation', mutate the dataset to create the column 'avg' with rowMeans, specifying the na.rm=TRUE to remove the NA values, and cbind (bind_cols) with the remaining columns in the original dataset by subsetting the original dataset with columns that are not found in the mutated dataset i.e. .. We can use setdiff to get the column names.

 library(dplyr)
 df %>% 
     select(Responsiveness:Translation) %>% 
     mutate(avg= rowMeans(., na.rm=TRUE)) %>% 
     bind_cols(df[setdiff(names(df), names(.))] , .)

But, doing rowMeans can be done without using any external package. In base R, we match the columns 'Responsiveness', 'Translation' with the column names of original dataset. This gives the numeric index of those columns. We can get the sequence (:) from 'start' (i1[1]), 'end' (i1[2]), and use rowMeans on the subset dataset.

 i1 <- match( c('Responsiveness', 'Translation'), names(df))
 df$avg <- rowMeans(df[i1[1]:i1[2]], na.rm=TRUE)

We can also remove some steps in the above dplyr code if we are using 'i1'

 df %>%
      mutate(avg= rowMeans(.[i1[1]:i1[2]], na.rm=TRUE))

NOTE: I am using dplyr_0.4.1.9000 on R 3.2.1. When there are no NA values, the OP's code is giving the same output as the rowMeans. But, if there is an NA value, I get a different value i.e. for the 2nd row in the example, I get 3.5 instead of 3.66667. Though, I am not getting any error.

data

set.seed(24)
df <- data.frame(V1=1:10, Responsiveness=1:10, V2= c(2, NA, 4:11), 
              V3=3:12, Translation=4:13)
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662