0

I am not familiar with if statements/loops/or functions in R. I have a dataset where I want to adjust the a variable (N) by the clustering of the study (the formula is this one: N/(1 + (M - 1) * ICC). Where N is the number of subjects, the M is the size of the cluster and ICC is the intra-class correlation coeff. I have all these variables in separate columns with each row identifying the different studies/sample sizes. Not all the studies have a clustering issues so I need to apply this function only to the subset of those with the ICC. I thought about something like this but I know it is missing something, and also, I don't know if a loop with an if statement is the most efficient way to go.

for (i in df$N) {                         # for every sample size in df$N
    if (df$ICC[i] != .) {                 # if ICC is not missing 
      df$N/(1 + (df$M - 1) * df$ICC)      # adjust the sample size by dividing the N by the size 
                                          of the cluster - 1 and multiply per the ICC of the study
    } else {                              
      df$N/1             #otherwise (ICC is missing) do nothing: ie., divide the N by 1. 
    }
}

Do you know how I could do this with something like this? Other solutions are also welcome! Thanks for any help or suggestion about this!

Here's an example of the dataset:

dput(head(df, 10))
structure(list(ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5), ArmsID = c(0, 
1, 0, 1, 0, 1, 0, 1, 0, 1), N = c(26, 34, 28, 27, 50, 52, 60, 
65, 150, 152), Mean = c(10.1599998474121, 5.59999990463257, 8, 
8.52999973297119, 17, 15.1700000762939, 48.0999984741211, 49, 
57, 55.1315803527832), SD = c(6.30000019073486, 4.30000019073486, 
5.6, 6.61999988555908, 6, 7.75, 10.1599998474121, 12, 11, 10.5495901107788
), SE = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ICC = c(0.03, 
0.02, NA, NA, 0.01, 0.003, NA, NA, NA, NA), M = c(5, 5, NA, NA, 
17, 16, NA, NA, NA, NA)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

The . meant to indicate missing data: NA. I want to apply the functions that adjust the N only to the rows that have an ICC.

idx <- which(!is.na(df$ICC))
df$N[idx] <- df$N[idx]/(1 + (df$M[idx] - 1) * df$ICC[idx])

This code correctly works, thanks!

Ilaria
  • 1
  • 1
  • Please share a sample of df using `dput(df)` – Jamie Dec 01 '22 at 17:49
  • 3
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Dec 01 '22 at 17:56
  • 1
    In `(df$ICC[i] != .)`, the `.` seems wrong. What is that supposed to be? – r2evans Dec 01 '22 at 19:11
  • Regardless, you may calculate something (not sure), but you never capture the output value from either calculation, so this work is discarded silently. – r2evans Dec 01 '22 at 19:12
  • I have just noticed that the ICC seems all =0 whereas the values are indeed: 0.02, 0.003, 0.01, 0.02. – Ilaria Dec 01 '22 at 20:03
  • Probably just `idx <- which(!is.na(df$ICC))` followed by `df$N[idx] <- df$N[idx]/(1 + (df$M[idx] - 1) * df$ICC[idx])`, but we need the results from `dput(head(df, 15))` pasted into your question to test. – dcarlson Dec 01 '22 at 22:09

0 Answers0