0

I read in a csv file via:

data = read.csv("airbnb.csv",header=T,sep=",")

data has over 100 variables and I need to calculate the mean of all of them. Actually I need to automate the following:

mean(data$variable1)
mean(data$variable2)

....

Is there any nice way I can do this? E.g. with a loop?

1 Answers1

4

You can use apply() or, as @akrun mentioned in a comment, colMeans(). The latter is optimized for this situation so it will likely perform better than the former for large datasets.

You mentioned that you have data of multiple types and you want to select only numeric columns. That's easy enough, you just have to identify the numeric columns beforehand. That can be done using sapply() with is.numeric().

# Select numeric columns
data.numcols <- data[, sapply(data, is.numeric)]

# Using apply
all.means <- apply(data.numcols, 2, mean)

# Using colMeans
all.means <- colMeans(data.numcols)

If your columns contain NA, you can exclude NA values like so:

# Using apply
all.means <- apply(data.numcols, 2, function(x) mean(x, na.rm = TRUE))

# Using colMeans
all.means <- colMeans(data.numcols, na.rm = TRUE)
Alex A.
  • 5,466
  • 4
  • 26
  • 56