1

I have this piece of code that i need to make as a for loop. This code only works for GroupA, and I have a lot of Groups (grouoA, groupB, groupC etc.) - how can i make it into a function/foor loop so it makes a table for all my groups

dt <- data.table(ID = c(1, 2, 3, 4),
                 sex= c('F', 'M', 'F', 'F'),
                 education = c("A", "B", "C", "D"),
                 age = c(23, 34, 55, 77),  
                 groupA= c(1, 5, 2, 4), 
                 groupB= c(2, 3, 4, 3),
                 groupC= c(3, 1, 2, 4), 
                 groupD= c(2, 3, 1, 3),
                 groupE= c(3, 5, 2, 4)
             
                )

dt[, {
x=as.character(groupA)
x[is.na(groupA)]='Missing'
values <- unique(x)
foreach(v = values, .combine = rbind)%do%){
out=data.table(group = "groupA", 
               N = sum(x == v),
               mean = sum(x==v)/.N))
out
}}]

Expected output is a table with;

group      N   mean
groupA     1     x (a number)
groupBetc  1     x (a number)
Hellihansen
  • 163
  • 6
  • 1
    Make it reproducible, provide example data and expected output. But my guess is read about dot dot - `..()`. Maybe a related post: https://stackoverflow.com/q/21658893/680068 – zx8754 Apr 12 '23 at 12:04
  • I made an example dataset now – Hellihansen Apr 12 '23 at 12:06
  • This is similar to https://stackoverflow.com/q/11562656/3358272, https://stackoverflow.com/q/1660124/3358272 – r2evans Apr 12 '23 at 15:47

3 Answers3

2

You basically want to summarize by group, in this case on only "group". You have your groups in long format, so we use melt here (similar to dyplyr's pivot_longer) to get your values in one column and your groups in another column. Then we just summarize your data. This is the data.table way of doing this.

melt(dt, measure = patterns("^group"))[, .(.N, mean = mean(value)), by = .(group = variable)]

results

    group N mean
1: groupA 4 3.00
2: groupB 4 3.00
3: groupC 4 2.50
4: groupD 4 2.25
5: groupE 4 3.50
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
  • what is value and variable? Should i define these – Hellihansen Apr 13 '23 at 08:09
  • 1
    @Hellihansen, `value` and `variable` are the default names of columns created by `melt`. They can be changed with `melt(..., variable.name="name1", value.name="name2")` which means the next `[`-expression would instead be `[, .(.N, mean=mean(name2)),by=.(group=variable)]`. It's part aesthetics, part lazy-typing (don't type when you don't need to), but perhaps this could be _slightly_ less obscure with `melt(dt, measure.vars=patterns("^group"), variable.name="group")[, .(.N, mean=mean(value)), by = .(group)]`. – r2evans Apr 13 '23 at 11:54
  • 1
    Bottom line, though ... did you try it? @Hellihansen – r2evans Apr 13 '23 at 11:55
1

Based on your groupA example and the answer of @Merijn (which has some great data.table code!), you might have a different definition of "mean" and are looking for the fraction of values with a given group "number" over the number of unique groups.

The below modification of the code might return the desired results (I also included an NA example:

library(data.table)
dt <- data.table(ID = c(1, 2, 3, 4),
                 sex= c('F', 'M', 'F', 'F'),
                 education = c("A", "B", "C", "D"),
                 age = c(23, 34, 55, 77),  
                 groupA= c(1, 5, 2, 4), 
                 groupB= c(2, 3, 4, 3),
                 groupC= c(3, NA, 2, 4), 
                 groupD= c(2, 3, 1, 3),
                 groupE= c(3, 5, 2, 4)
                )
res <- melt(dt, measure = patterns("^group"))[, .(.N), by = .(group = variable, val = value)]
res[, mean:=N/.N, by=group]
res[, val:= ifelse(is.na(val), "Missing", val)][]
#>      group     val N      mean
#>  1: groupA       1 1 0.2500000
#>  2: groupA       5 1 0.2500000
#>  3: groupA       2 1 0.2500000
#>  4: groupA       4 1 0.2500000
#>  5: groupB       2 1 0.3333333
#>  6: groupB       3 2 0.6666667
#>  7: groupB       4 1 0.3333333
#>  8: groupC       3 1 0.2500000
#>  9: groupC Missing 1 0.2500000
#> 10: groupC       2 1 0.2500000
#> 11: groupC       4 1 0.2500000
#> 12: groupD       2 1 0.3333333
#> 13: groupD       3 2 0.6666667
#> 14: groupD       1 1 0.3333333
#> 15: groupE       3 1 0.2500000
#> 16: groupE       5 1 0.2500000
#> 17: groupE       2 1 0.2500000
#> 18: groupE       4 1 0.2500000

Created on 2023-04-12 with reprex v2.0.2

user12728748
  • 8,106
  • 2
  • 9
  • 14
1

Similar to previous post from @Merijn , just the dplyr way of doing it

library(data.table)
library(dplyr)
df <- melt(dt, measure = patterns("^group"))
df %>% group_by(variable) %>% summarise(N=n(), mean=mean(value))