R: How to automatically calculate the mean of all variables of a dataset?

Question

I read in a csv file via:

data = read.csv("airbnb.csv",header=T,sep=",")

data has over 100 variables and I need to calculate the mean of all of them. Actually I need to automate the following:

mean(data$variable1)
mean(data$variable2)

....

Is there any nice way I can do this? E.g. with a loop?

`colMeans(data)` would be easier and if there are `NA`, you can use `na.rm=TRUE` — akrun, Apr 27 '15 at 15:55

Alex A. · Answer 1 · 2015-04-27T17:16:02.347

4

You can use apply() or, as @akrun mentioned in a comment, colMeans(). The latter is optimized for this situation so it will likely perform better than the former for large datasets.

You mentioned that you have data of multiple types and you want to select only numeric columns. That's easy enough, you just have to identify the numeric columns beforehand. That can be done using sapply() with is.numeric().

# Select numeric columns
data.numcols <- data[, sapply(data, is.numeric)]

# Using apply
all.means <- apply(data.numcols, 2, mean)

# Using colMeans
all.means <- colMeans(data.numcols)

If your columns contain NA, you can exclude NA values like so:

# Using apply
all.means <- apply(data.numcols, 2, function(x) mean(x, na.rm = TRUE))

# Using colMeans
all.means <- colMeans(data.numcols, na.rm = TRUE)

edited Apr 27 '15 at 17:16

answered Apr 27 '15 at 15:57

Alex A.

5,466
4
26
56

For both options I am getting an error message saying that the argument is neither a numeric nor a boolean value -> output is NA – Nico Kriegschmichnet Apr 27 '15 at 16:55
the variables have different types: factor, int, and num (so I want to skip the factor variables and only calculate the mean for the int and num variables) – Nico Kriegschmichnet Apr 27 '15 at 16:57

R: How to automatically calculate the mean of all variables of a dataset?

1 Answers1