0

I am not used to R, so to practice I am trying to do everything that I used to do on SPSS on R.

In my dataset each row is a case. The columns are survey questions (1 per question).

Say I have columns "A1" up to "A6", "B1" to "B6" and so on

I just finished calculating the mean for each person on A1 to A6

data$meandata <- rowMeans(subset(data, select=c(A1:A6), na.rm=TRUE))

How do I calculate the standard deviation of meandata ?

iamnarra
  • 147
  • 2
  • 7
  • 1
    Is the `sd()` enough? – storaged Mar 14 '18 at 18:35
  • 1
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 14 '18 at 18:37
  • `apply(subset(data, select = A1:A6), 1, sd, na.rm=TRUE) `. – Rui Barradas Mar 14 '18 at 18:47
  • Your `rowMeans` call is wrong, the parenthesis must close `subset`, not after `na.rm`. – Rui Barradas Mar 14 '18 at 18:48
  • @MrFlick Thanks. As someone who is still learning the ropes of R, it took me 30 minutes just to type out this simple question, which was clearly not enough for the community. – iamnarra Mar 14 '18 at 21:05
  • Well, hopefully the link will provide you with useful information to make writing your next question easier. It's less useful to talk about code abstractly than it is to have a real example to try stuff out on. – MrFlick Mar 14 '18 at 21:07

1 Answers1

4

Hey the easiest way to do this is with the apply() function.

Assume you have 25 rows of data and 6 columns labeled A1 through A6.

data <- data.frame(A1=rnorm(25,50,4),A2=rnorm(25,50,4),A3=rnorm(25,50,4),
A4=rnorm(25,50,4),A5=rnorm(25,50,4),A6=rnorm(25,50,4))

You can use the apply function to find the standard deviation of each row columns 1 through 6 with the code below. The first argument is your data object. The second argument is an integer specifying either 1 for rows or 2 for columns (This is the direction the function will be applied to the data frame). The final argument is the function you wish to apply to your data frame (such as mean or standard deviation (sd) in this case. See the code below.

apply(data[,1:6],1,sd)

Indexing can be used to limit the number of rows or columns of data passed to the apply function. This is done by entering a vector of numbers for either the rows or columns you are interested in within brackets after your data object.

data[c(row.vector),c(column.vector)]

Say you only want to know the sd of the first 3 columns.

apply(data[,1:3],1,sd)

Now lets see the sd of columns 4 through 6 and rows 1 through 10

apply(data[1:10,4:6],1,sd)

Just for good measure lets find the sd of each column

apply(data,2,sd)

Notice that the sd is close to 4, which, is what I specified when I generated the pseudo-random data for columns A1 through A6.

Hope this helps

THATguy
  • 292
  • 2
  • 11