Make a function/for loop in R

Question

I have this piece of code that i need to make as a for loop. This code only works for GroupA, and I have a lot of Groups (grouoA, groupB, groupC etc.) - how can i make it into a function/foor loop so it makes a table for all my groups

dt <- data.table(ID = c(1, 2, 3, 4),
                 sex= c('F', 'M', 'F', 'F'),
                 education = c("A", "B", "C", "D"),
                 age = c(23, 34, 55, 77),  
                 groupA= c(1, 5, 2, 4), 
                 groupB= c(2, 3, 4, 3),
                 groupC= c(3, 1, 2, 4), 
                 groupD= c(2, 3, 1, 3),
                 groupE= c(3, 5, 2, 4)
             
                )

dt[, {
x=as.character(groupA)
x[is.na(groupA)]='Missing'
values <- unique(x)
foreach(v = values, .combine = rbind)%do%){
out=data.table(group = "groupA", 
               N = sum(x == v),
               mean = sum(x==v)/.N))
out
}}]

Expected output is a table with;

group      N   mean
groupA     1     x (a number)
groupBetc  1     x (a number)

Make it reproducible, provide example data and expected output. But my guess is read about dot dot - `..()`. Maybe a related post: https://stackoverflow.com/q/21658893/680068 — zx8754, Apr 12 '23 at 12:04
This is similar to https://stackoverflow.com/q/11562656/3358272, https://stackoverflow.com/q/1660124/3358272 — r2evans, Apr 12 '23 at 15:47

Merijn van Tilborg · Answer 1 · 2023-04-12T12:53:19.697

2

You basically want to summarize by group, in this case on only "group". You have your groups in long format, so we use melt here (similar to dyplyr's pivot_longer) to get your values in one column and your groups in another column. Then we just summarize your data. This is the data.table way of doing this.

melt(dt, measure = patterns("^group"))[, .(.N, mean = mean(value)), by = .(group = variable)]

results

    group N mean
1: groupA 4 3.00
2: groupB 4 3.00
3: groupC 4 2.50
4: groupD 4 2.25
5: groupE 4 3.50

edited Apr 12 '23 at 12:53

answered Apr 12 '23 at 12:47

Merijn van Tilborg

5,452
1
7
22

what is value and variable? Should i define these – Hellihansen Apr 13 '23 at 08:09
1

@Hellihansen, `value` and `variable` are the default names of columns created by `melt`. They can be changed with `melt(..., variable.name="name1", value.name="name2")` which means the next `[`-expression would instead be `[, .(.N, mean=mean(name2)),by=.(group=variable)]`. It's part aesthetics, part lazy-typing (don't type when you don't need to), but perhaps this could be _slightly_ less obscure with `melt(dt, measure.vars=patterns("^group"), variable.name="group")[, .(.N, mean=mean(value)), by = .(group)]`. – r2evans Apr 13 '23 at 11:54
1

Bottom line, though ... did you try it? @Hellihansen – r2evans Apr 13 '23 at 11:55

score 1 · Answer 2 · answered Apr 12 '23 at 13:40

Based on your groupA example and the answer of @Merijn (which has some great data.table code!), you might have a different definition of "mean" and are looking for the fraction of values with a given group "number" over the number of unique groups.

The below modification of the code might return the desired results (I also included an NA example:

library(data.table)
dt <- data.table(ID = c(1, 2, 3, 4),
                 sex= c('F', 'M', 'F', 'F'),
                 education = c("A", "B", "C", "D"),
                 age = c(23, 34, 55, 77),  
                 groupA= c(1, 5, 2, 4), 
                 groupB= c(2, 3, 4, 3),
                 groupC= c(3, NA, 2, 4), 
                 groupD= c(2, 3, 1, 3),
                 groupE= c(3, 5, 2, 4)
                )
res <- melt(dt, measure = patterns("^group"))[, .(.N), by = .(group = variable, val = value)]
res[, mean:=N/.N, by=group]
res[, val:= ifelse(is.na(val), "Missing", val)][]
#>      group     val N      mean
#>  1: groupA       1 1 0.2500000
#>  2: groupA       5 1 0.2500000
#>  3: groupA       2 1 0.2500000
#>  4: groupA       4 1 0.2500000
#>  5: groupB       2 1 0.3333333
#>  6: groupB       3 2 0.6666667
#>  7: groupB       4 1 0.3333333
#>  8: groupC       3 1 0.2500000
#>  9: groupC Missing 1 0.2500000
#> 10: groupC       2 1 0.2500000
#> 11: groupC       4 1 0.2500000
#> 12: groupD       2 1 0.3333333
#> 13: groupD       3 2 0.6666667
#> 14: groupD       1 1 0.3333333
#> 15: groupE       3 1 0.2500000
#> 16: groupE       5 1 0.2500000
#> 17: groupE       2 1 0.2500000
#> 18: groupE       4 1 0.2500000

^{Created on 2023-04-12 with reprex v2.0.2}

score 1 · Answer 3 · answered Apr 12 '23 at 13:51

1

Similar to previous post from @Merijn , just the dplyr way of doing it

library(data.table)
library(dplyr)
df <- melt(dt, measure = patterns("^group"))
df %>% group_by(variable) %>% summarise(N=n(), mean=mean(value))

answered Apr 12 '23 at 13:51

user1254315979742

46
5

what is variable and value? – Hellihansen Apr 13 '23 at 07:56
variable and value are column names in df (melted dt). variable column contains group name (groupA..groupE) while value column contains respected value for each group as in your original dt in rowwise manner – user1254315979742 Apr 13 '23 at 09:31

Make a function/for loop in R

3 Answers3