Now I'm fairly new to R, but I know there are a lot of answers to this on various places.
Although I welcome suggestions on how to achieve this, my question is more about why this operation is not simpler (or if it is simpler, I'd love to know how to do it because I've been searching for a while so please point me to the right post or resource).
I have a dataset, say it looks like this:
v1 <- runif(5, 1, 7)
v2 <- runif(5, 1, 7)
v3 <- runif(5, 1, 7)
v4 <- runif(5, 1, 7)
v5 <- runif(5, 1, 7)
df <- as.data.frame(cbind(v1, v2, v3, v4, v5))
Now instead of having 5 variables I have a thousand.
I want to compute the mean for var2:var4 and I want these values to be stored in a new column so that each row has its own mean value. I would call this "averaging across rows" but I realize there may be a different way to describe it.
For each row, I want the average to be computed based on all available values on that row. If a person happens to have not answered a question (eg blank or NA), I still want that person to be included.
I don't want to have to count the columns in order to call them, I know the names of the variables. I don't want to type several lines of code like they do in this post or in this post.
This is such a common operation in social sciences and I have a feeling it should be (or it is) simpler. If it is simpler, I'm not sure why I'm unable to find a simpler solution. In SPSS, for example, I would type something like:
COMPUTE newvar = mean(var2 to var4).
execute.
How do I do this in R?
My first intuition was to try something like this (which does not work):
df$newvar <- rowMeans(df, nat1:nat6)
I’ve been able to achieve my desired result with the following code:
itemstouse <- select(df, var2:var4)
df$newvar <- rowMeans(itemstouse)
Or I could include it in one line like this:
df$newvar <- rowMeans(select(df, var2:var4))
But that still requires three operations. It seems like it should be simpler and I'm confused as to why I'm unable to find a solution as simple as the SPSS script.
I admit, I am a noob when it comes to R, but some things should be fairly intuitive. ggplot is very intuitive, for example. And many things in R are quite easy to learn, but this one is tripping me up for some reason so I'd appreciate your input.