1

I got several columns in my dataset, which was shown like below:

ID...Var1  Var2  Var3  Var4  Var5  Var6...
1... Yes   No    Yes    No    No   Yes...
2... No    No    No     No    No    No...
3... Yes  Yes   Yes    Yes   No    Yes...
4... No    No    No     No    No    No...
5... No   Yes    Yes    No    No   Yes...
6... No    No    Yes    No    No   Yes...
7... Yes  Yes    No     No    No    No...

And I want to count how many "Yes" for each ID person in these 6 variables, which means I want add the column like this:

ID...Var1  Var2  Var3  Var4  Var5  Var6  Count
1... Yes   No    Yes    No    No   Yes     3
2... No    No    No     No    No    No     0
3... Yes  Yes   Yes    Yes   No    Yes     5
4... No    No    No     No    No    No     0
5... No   Yes    Yes    No    No   Yes     3
6... No    No    Yes    No    No   Yes     2
7... Yes  Yes    No     No    No    No     2

And I was using R to do the data management, can you guys provide me some guide or R syntax help?

Marcus
  • 29
  • 3
  • Check out `rowSums` - `mat <- matrix(c("yes","no","yes","yes"),ncol=2); rowSums(mat=="yes")` for example. – thelatemail Mar 22 '18 at 23:42
  • 1
    Hi Marcus, it's a local rule that you also need to show what you tried. So it would be good to edit your question and add to the end "I tried this.." and "And this happened..." – Leon Bambrick Mar 22 '18 at 23:42

1 Answers1

1

Here's a smaller, reproducible example & solution of your problem. (Future hint: make sure to include an actual easy to use, reproducible sample of your data)

df <- data.frame(v1 = c("Yes", "Yes", "Yes", "No"),
                 v2 = c("No", "Yes", "No", "No"),
                 v3 = c("Yes", "Yes", "No", "No"), stringsAsFactors = FALSE)

df$count <- rowSums(df[c(1,3)] == "Yes")

df

   v1  v2  v3 count
1 Yes  No Yes     2
2 Yes Yes Yes     2
3 Yes  No  No     1
4  No  No  No     0
Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36
  • Thanks for your comments, I got what you mean. But the hard point is that I need to choose the specified columns to count. The dataset definitely do not have only these 3 columns. Let's suppose v1 to v3 was column 11 to column 13, so how should I specified in these 3 columns in your case? – Marcus Mar 22 '18 at 23:47
  • You specify the subset of the data you want to use via `[`, as shown above, I'm summing only column one and three. – Jake Kaupp Mar 22 '18 at 23:53
  • I tried in my case, and I found these is something wrong. In my case, I have 6 columns variables which I need count how much columns is "Yes" in each row. And my R code is like below: test$AdvancedServiceCount=rowSums(test[c(10,15)] == "Yes"). And the outcome was not correct. I found there were some cases which have 3"Yes" in a row, but in my outcome, the maximum value is 2. So, I was totally confused. – Marcus Mar 23 '18 at 00:19
  • Your code takes the sum of only column 10 and 15. You want `test$AdvancedServiceCount=rowSums(test[10:15] == "Yes")` – Jake Kaupp Mar 23 '18 at 00:20