2

I'm pretty new to R and I'm trying to figure out how to write code to get the frequency for multiple columns based on different conditions.

Example Data

ID        Group Age Gender Total_T  Neg_Mood_T  Interpersonal_Prob_T    
6000-01-00  0   9   1   44.00   49.00   42.00   44.00   48.00   40.00
6000-02-00  0   12  1   53.00   54.00   42.00   59.00   52.00   51.00
6000-03-00  0   7   2   72.00   50.00   56.00   58.00   81.00   84.00
6000-04-00  0   7   1   41.00   44.00   49.00   47.00   41.00   40.00
6000-05-00  0   9.5 1   38.00   44.00   42.00   39.00   41.00   40.00
6000-06-00  1   8   1   39.00   38.00   57.00   39.00   41.00   40.00
6000-07-00  1   9   1   38.00   44.00   42.00   39.00   41.00   40.00
6000-08-00  1   18  1   41.00   44.00   44.00   48.00   41.00   40.00
6000-09-00  1   9   2   58.00   54.00   45.00   47.00   69.00   56.00
6000-10-00  1   11  2   42.00   40.00   45.00   47.00   46.00   40.00

So, I began with a simple code to figure out the frequency of what occurs in a variable based on some condition in this code:

condition 1:

Total_T <- sum(data$Total_T[data$Group==0]>=60, na.rm=TRUE)

condition 1:

Total_T <- sum(data$Total_T[data$Group==0]<60, na.rm=TRUE)

However, I need to repeat this code a bunch more times for different variables and different conditions (i.e. condition 1 would be repeated for 4 more variables as would condition 2 and so forth) and I would like to figure out how to make it more efficient.

So, I'm hoping to create a code that will return the frequency of Total_T, Neg_Mood_T etc based on the conditions I place on Group, Age and Gender.

I've tried to use data.frame(table()), ddply, but I'm honestly stumped.

Thanks !

solomo31
  • 23
  • 4
  • 1
    Please read [(1)](http://stackoverflow.com/help/how-to-ask) how do I ask a good question, [(2)](http://stackoverflow.com/help/mcve) How to create a MCVE as well as [(3)](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610) how to provide a minimal reproducible example in R. Then edit and improve your question accordingly. – Christoph Oct 25 '16 at 19:43
  • The cliff's notes version is to add 1) some example data 2) your desired output 3) the logic to get you there. – Pierre L Oct 25 '16 at 19:46

1 Answers1

0

We can use subset to get the part of the data we need, then sum:

x1 <- subset(data, Group== 0 & Gender == 1, select="Total_T")
sum(x1[x1 >= 60], na.rm=TRUE)
sum(x1[x1 < 60], na.rm=TRUE)

#Wrapped in a function
fun <- function(cols) {
  x1 <- subset(data, Group== 0 & Gender == 1, select=cols)
  sum(x1[x1 >= 60], na.rm=TRUE)
}  

fun("Total_T")
[1] 176
fun("Neg_Mood_T")
[1] 191

If you would like to get all the columns in one shot, you can use:

library(dplyr)
data %>% filter(Group == 0 & Gender == 1) %>%
  summarise_at(-(1:4), funs(sum(.[. < 60])))
# Total_T Neg_Mood_T Interpersonal_Prob_T col7 col8 col9
# 1     176        191                  175  189  182  171

Edit

There is a difference between summing the values of Total_T that fit the conditions and summing the number of times a value fits the description. We can show with an example:

x <- 1:10

#condition
x > 5

#1. sum values fitting the condition
sum(x[x > 5])
[1] 40

#2. sum number of times a value fits condition
sum(x > 5)
[1] 4
Pierre L
  • 28,203
  • 6
  • 47
  • 69
  • Hey ! So, the code I'm trying to create has to account for the less than/greater than 60 condition. For instance, I am looking for the frequency of Total_T when Total_T> 60 when group is 0 and gender is 1, if that makes sense. – solomo31 Oct 25 '16 at 20:58
  • Edited to reflect the condition. – Pierre L Oct 25 '16 at 21:01
  • Hi Pierre, So, I tried running the code at home and every time I tried to run it fun("Total_T") returned 0 as did the other variable. Do you know why that might be? Also, I realized my initial code might have been misleading. I used the sum function because I was able to get the frequency of that particular variable based on those conditions rather than the actual sum. So, using Total_T <- sum(data$Total_T[data$Group==0]>=60, na.rm=TRUE), I would have gotten 1. So, I suppose in this case the sum function might be inappropriate, would it be better to use length? And how would that look? – solomo31 Oct 26 '16 at 01:21
  • The reason you get 0 is because there are no `Total_T` values greater than or equal to 60 – Pierre L Oct 26 '16 at 01:39
  • If you would like to add up the number of times `Total_T` is greater than 60 and fits the conditions then I can adjust the function. – Pierre L Oct 26 '16 at 01:39
  • Hi Pierre, My bad. I misunderstood the code but your explanation cleared it up for me. I was actually able to edit the code to add up the number of times a variable occurs based on some condition. if it's not too much trouble, would you mind explaining function(cols)? – solomo31 Oct 26 '16 at 04:16
  • cols, is the name that I gave to the input for the function. It's a made up name. I could've used x, or 'input', or anything else. It's an arbitrary name. – Pierre L Oct 26 '16 at 10:57