0

My data set in R looks like this

      Id Gender   Age Participate    Q1   Q10    Q2    Q3    Q4
*  <int>  <chr> <int>       <int> <chr> <chr> <chr> <chr> <chr>
1     16   Male    20           1     0     1     0     1     1
2     17   Male    40           1     1     0     0     0     0
3     18   Male    33           1     1     0     0     0     0
4     19   Male    18           1     1     0     0     0     0
5     20   Male    24           1     0     0     1     0     0
6     21 Female    42           1     0     0     1     0     0
7     22 Female    19           1     1     0     0     1     1
8     28 Female    49           1     0     1     1     0     0
9     29 Female    17           1     1     0     1     0     0
10    31   Male    18           1     1     0     1     0     0

I want to add together the numbers in Q1 - Q10 to create a score. How can I do this? Thanks

Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
  • Just do `df1$Total <- rowSums(df1[6:10])` or with `dplyr` `df1 %>% mutate(Total = rowSums(.[grep("^Q\\d+", names(.))]))` Based on the column types, you need to convert the `character` class to `numeric` with `as.numeric` – akrun Sep 25 '16 at 12:33

2 Answers2

0

Suppose you saved your data into the R variable dat To add Q1 with Q10 you could simply use:

score <- as.numeric(dat$Q1) + as.numeric(dat$Q10)

If you want to select more than two variables (e.g., Q1, Q2, Q3, Q10) you could do (similarly to Akrun suggestion):

score <- rowSums(dat[, paste('Q', c(1:3, 10), sep = '')])

Where rowSums is a function summing the values of the selected columns and paste creates the names of the columns to select (i.e., Q1, Q2, Q3, and Q10). As suggested by Akrun you should transform your columns with character data-type (or factor) to the numeric data type before calling rowSums. If the 'transformation' from numeric to character happens when you are importing the data please read this post explaining that sometimes using stringsAsFactors=FALSE when importing data into R might resolve similar issues.

Community
  • 1
  • 1
HelloWorld
  • 697
  • 10
  • 16
0

reproduced your data :

      Id Gender   Age Participate    Q1   Q10    Q2    Q3    Q4
*  <int>  <chr> <int>       <int> <chr> <chr> <chr> <chr> <chr>
1     16   Male    20           1     0     1     0     1     1
2     17   Male    40           1     1     0     0     0     0
3     18   Male    33           1     1     0     0     0     0
4     19   Male    18           1     1     0     0     0     0
5     20   Male    24           1     0     0     1     0     0
6     21 Female    42           1     0     0     1     0     0
7     22 Female    19           1     1     0     0     1     1
8     28 Female    49           1     0     1     1     0     0
9     29 Female    17           1     1     0     1     0     0
10    31   Male    18           1     1     0     1     0     0

first you need to convert Q1 and Q10 to numeric format since they're currently saved as character.

Mutate_each in the Dplyr package allows you to apply one or more functions to one or more columns to where starts_with in the same package allow you to select variables based on their names.

So using a combination of both you can do the following :

library(dplyr)
data <- data %>% 
  mutate_each(funs(as.numeric), starts_with("Q"))

Look at the results :

str(data)
'data.frame':   10 obs. of  9 variables:
 $ Id         : int  16 17 18 19 20 21 22 28 23 31
 $ Gender     : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 1 1 1 1 2
 $ Age        : int  20 40 33 18 24 42 19 49 17 18
 $ Participate: int  1 1 1 1 1 1 1 1 1 1
 $ Q1         : num  0 1 1 1 0 0 1 0 1 1
 $ Q10        : num  1 0 0 0 0 0 0 1 0 0
 $ Q2         : num  0 0 0 0 1 1 0 1 1 1
 $ Q3         : num  1 0 0 0 0 0 1 0 0 0
 $ Q4         : num  1 0 0 0 0 0 1 0 0 0

Your Q* variables are now numeric so you can treat them by selecting only the variables starting with "Q" using the dplyr::select verb and using rowSumns allows you to sum up all the columns of a given row so :

data %>% select(starts_with("Q")) %>% rowSums(.) -> data$Score

Where :

  • select(starts_with("Q")) = Select the columns starting with Q

  • rowSums(.) = sum up the selected columns

  • -> = assign the result to data$Score

and then you can check the results :

   Id Gender Age Participate Q1 Q10 Q2 Q3 Q4 Score
1  16   Male  20           1  0   1  0  1  1     3
2  17   Male  40           1  1   0  0  0  0     1
3  18   Male  33           1  1   0  0  0  0     1
4  19   Male  18           1  1   0  0  0  0     1
5  20   Male  24           1  0   0  1  0  0     1
6  21 Female  42           1  0   0  1  0  0     1
7  22 Female  19           1  1   0  0  1  1     3
8  28 Female  49           1  0   1  1  0  0     2
9  23 Female  17           1  1   0  1  0  0     2
10 31   Male  18           1  1   0  1  0  0     2
Menelith
  • 521
  • 2
  • 4
  • 13