Loop group_by function over multiple columns

Question

So I am trying to create a table with counts of distinct records in my data table

mytable <-
   group team num  ID
 1   a   x    1    9
 2   a   x    2    4
 3   a   y    3    5
 4   a   y    4    9
 5   b   x    1    7
 6   b   y    4    4
 7   b   x    3    9
 8   b   y    2    8

The column names are group,team, num, and ID. I want an individual table that contains the counts of distinct records in each of the columns. I want the table names to be in the format "table_colName"

colName <- c('group','team','num','ID')

for (col in colName)
     'table_'+colName <- mytable %>% group_by(col) %>% summarise(Count = n())

This generate an error "Error in grouped_df_impl(data, unname(vars), drop) : Column col is unknown".

Is there a way I can iterate through the group_by function using the columns in my data table and to save it to a new data table each time so that in this example I end up with table_group, table_team,table_num, and table_ID?

In the future please put your sample data in a reproducible format, such as by using `dput()` or creating the data within your question (in the code). — Hack-R, Jul 16 '18 at 20:35
Related to Hack-R's suggestion, you can similarly explicitly provide an object containing your desired output so you won't need to make clarifying comments after someone posts an answer. Some guidance on such things: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 — Frank, Jul 16 '18 at 21:40

score 1 · Answer 1 · answered Jul 16 '18 at 21:42

An option is to use group_by_at in combination with lapply. You need to pass columns of mytable to lapply. The function will group each columns and result will be available in a list.

library(dplyr)

lapply(names(mytable), function(x){
  group_by_at(mytable, x)%>%summarise(Count = n()) %>% as.data.frame()
})


# [[1]]
#   group Count
# 1     a     4
# 2     b     4
# 
# [[2]]
#   team Count
# 1    x     4
# 2    y     4
# 
# [[3]]
#   num Count
# 1   1     2
# 2   2     2
# 3   3     2
# 4   4     2
# 
# [[4]]
#   ID Count
# 1  4     2
# 2  5     1
# 3  7     1
# 4  8     1
# 5  9     3

Data:

mytable <- read.table(text=
"group team num  ID
1   a   x    1    9
2   a   x    2    4
3   a   y    3    5
4   a   y    4    9
5   b   x    1    7
6   b   y    4    4
7   b   x    3    9
8   b   y    2    8",
header = TRUE, stringsAsFactors = FALSE)

score 0 · Answer 2 · answered Jul 16 '18 at 20:57

0

try this:

mytable %>% 
  group_by(.dots=c('group','team','num','ID')) %>% 
  summarise(Count = n())

answered Jul 16 '18 at 20:57

j.y

1

Thanks for the help, unfortunately I might not have described the issue well enough. I am not try to set the group_by on the four columns as a whole. I want to create a data table for each individual column with the distinct counts. So in this case I want a table for 'group' where it has the counts of distinct groups, then separate table for 'team' with its distinct counts and so on – C.Lee Jul 16 '18 at 21:02

score 0 · Accepted Answer · answered Jul 16 '18 at 21:39

I was able to fix this with the code below, thank you all for your attempt at helping me but I am new to coding and probably did not phrase the question right, sorry!

 colName <- c('group','team','num','ID')

 for (col in colName) {
     tables <- paste('table',col, sep = '_')
     assign(tables, mytable %>% group_by(.dots = col) %>% summarise(Count = n()))
     }

score 0 · Answer 4 · answered Jul 18 '18 at 14:55

A solution using data.table and lapply.

Create data

library(data.table)

dt <- read.table(text = "
group team num  ID
1   a   x    1    9
2   a   x    2    4
3   a   y    3    5
4   a   y    4    9
5   b   x    1    7
6   b   y    4    4
7   b   x    3    9
8   b   y    2    8")

Code to generate results

setDT(dt)

l <- lapply(cnms, function(i)setnames(dt[, .N, get(i)], "get", i))
names(l) <- paste0("table_", cnms)
str(l)

Loop group_by function over multiple columns

4 Answers4