0

I'm new to R and would very much appreciate if you could share your thoughts. Having found Applying calculation per groups within R dataframe, I decided to use data.table to calculate % of respondents who returned 'b' out of all respondents who returned either 'a' or 'b'.

My example data can be obtained from below (with some warning messages):

library(data.table)
df = data.table(rep(c("I", "II", "III", "IV")), rep(c("A", "B", "C")), 
                rep(c("a", "a", "b", "b", "b"), 10))
colnames(df) = c("grp1", "grp2", "Q1")

Below are my current codes to obtain a desired outcome;

#1. Obtain the number of rows on Q1 with 'a' and 'b' broken down by grp1 and grp2
ab = df[Q1 %in% c("a", "b"), .N, by = .(grp1, grp2)][order(grp1, grp2)] 

#2. Obtain the equivalent number of rows on Q1 with 'b' only
b = df[Q1 %in% "b", .N, by = .(grp1, grp2)][order(grp1, grp2)] 

# 3. Merge tables 'ab' and 'b' into 'a'
a = merge(b, ab, by = c("grp1", "grp2")) 

# 4. Calculate %b in table a
a$agree = a[, 3]/a[, 4]*100 


    grp1 grp2 N.x N.y agree
 1:    I    A   3   5    60
 2:    I    B   2   4    50
 3:    I    C   3   4    75
 4:   II    A   2   4    50
 5:   II    B   3   5    60
 6:   II    C   2   4    50
 7:  III    A   2   4    50
 8:  III    B   2   4    50
 9:  III    C   3   4    75
10:   IV    A   3   4    75
11:   IV    B   3   4    75
12:   IV    C   2   4    50


# 5. Remove unnecessary columns
a[, 3:4] = NULL 


    grp1 grp2 agree
 1:    I    A    60
 2:    I    B    50
 3:    I    C    75
 4:   II    A    50
 5:   II    B    60
 6:   II    C    50
 7:  III    A    50
 8:  III    B    50
 9:  III    C    75
10:   IV    A    75
11:   IV    B    75
12:   IV    C    50

Above is the output I need. Now I have 50 questionnaire items (Q1 to Q50), and if I want to obtain the same outputs for those 50 questions, I would have to repeat the same process (1 to 5) 50 times, and this is not ideal.

I would like to find a way to avoid this repetitive process.

I'm sure there must be an elegant way to automate this process, but as a r-newbie, I'm yet to find out how. Rather than driven by my limited knowledge, I thought I should be guided by good practice you may have.

Community
  • 1
  • 1
tmr
  • 13
  • 5
  • 2
    Your data is not reproducible and it is not clear how you want to specifically extend your (working?) code. See http://stackoverflow.com/a/28481250/ for guidance on asking a good R question here. – Frank Mar 14 '17 at 15:28
  • 1
    Thank you Frank for editing my question – tmr Mar 14 '17 at 21:02
  • You can do `df[, sum(Q1 %in% "b")/.N*100, by=grp1:grp2]` – Frank Mar 14 '17 at 21:09
  • Thank you Frank - I'll try your code. – tmr Mar 14 '17 at 21:32
  • Sorry again - in the example data, there was no missing data, but in reality, there are some NAs in my data. This was why I created data.table 'ab' and 'b'. Is there any function to remove NAs in the calculation? na.rm = T didn't work – tmr Mar 14 '17 at 21:47
  • I don't follow. The numerator will not be affected by NAs (since %in% always gives back TRUE/FALSE). If you need the denominator to exclude NAs, use `sum(!is.na(Q1))` instead of `.N`..? – Frank Mar 14 '17 at 21:55
  • 1
    Great - thank you so much – tmr Mar 14 '17 at 21:56
  • @Frank, thank you for your advice! The new code `df[, sum(Q1 %in% "b")/sum(!is.na(Q1))*100, by = grp1:grp2]` works perfect. I then decided to create a function using the code so that I could reduce the repetitive process as much as I can. I then got an error message `Error in eval(expr, envir, enclos) : object 'grp1' not found`. Should I edit my current question or post a new question? The code looks like `test = function(question, groupA, groupB){ df[, sum(question %in% "b")/sum(!is.na(question))*100, by = groupA : groupB] }` `test(question = Q1, groupA = grp1, groupB = grp2)` – tmr Mar 15 '17 at 16:48
  • Ah, I would change two things: pass the column names in quotes ("Q1", etc), and change the `by=c(groupA, groupB)`. You'll also need to change `question` to `get(question)`, I guess. There are several questions about using data.table programmatically on SO already. I'd suggest reviewing those and seeing how far it gets you before posting a new question. In any case, yeah, I don't think you should edit it in here. – Frank Mar 15 '17 at 16:56
  • 1
    I asked you the question above so I could learn/follow the rules within Stackoverflow - thank you for your thoughts. – tmr Mar 15 '17 at 16:56
  • @Frank - thank you very much for sharing your thoughts. This is much appreciated. – tmr Mar 15 '17 at 16:57

0 Answers0