3

I calculate number of rows by group, where the grouping variable is a factor. I want also factor levels which are not represented in the data, i.e. have zero rows, to be included in the result.

A small data example with variable 'x', a factor with levels c("a", "b", "c")

library(data.table)
DT = data.table(x = factor(rep(c("b", "a", "c"), each = 3)))

The data is filtered, e.g. all rows of x == "c" are removed, and number of rows by group is calculated. However, the zero count of level "c" is not shown in the result:

DT[x != "c"][, .N, by = x]
        x     N
   <fctr> <int>
1:      b     3
2:      a     3

The desired result should include also the zero count of "c":

        x     N
   <fctr> <int>
1:      b     3
2:      a     3
3:      c     0 # <--

Is there some way to get this output?

Henrik
  • 65,555
  • 14
  • 143
  • 159
iago
  • 2,990
  • 4
  • 21
  • 27
  • 1
    Isn't `DT[x!="c"][, as.data.frame(table(x))]` sufficient? – Roland May 16 '23 at 09:18
  • 1
    @Roland it is, indeed, although I would like a more `data.table` contained solution. – iago May 16 '23 at 09:32
  • `DT[, .N, x][x == "c", N := 0L][]` or `DT[, .(N = if ("c" == .BY$x) 0L else .N), x]` – r2evans May 16 '23 at 11:19
  • Related: [Frequency table including zeros for unused values, on a data.table](https://stackoverflow.com/questions/23547200/frequency-table-including-zeros-for-unused-values-on-a-data-table) – Henrik May 17 '23 at 08:00

2 Answers2

4

Using the join syntax

DT[x != "c"][levels(x), on = "x", .N, by = .EACHI]

#         x     N
#    <char> <int>
# 1:      a     3
# 2:      b     3
# 3:      c     0
s_baldur
  • 29,441
  • 4
  • 36
  • 69
  • Great! Perfect answer. May you explain it a little bit, at least the `.EACHI`? Thanks!!! – iago May 16 '23 at 09:46
  • 2
    @iago `.EACHI` stands for each i following the logic of `DT[i, j, by]`. For each of the levels in this case. You can play around with it for example: `NDT[c("a", "d", "e", "c", "a"), on = "x", .N, by = .EACHI]` – s_baldur May 16 '23 at 09:48
0

You may try -

library(data.table)
remove_vars <- c("c")

DT[, if(all(x %in% remove_vars)) 0L else .N, by = x]

#   x V1
#1: b  3
#2: a  3
#3: c  0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    I think you are missing the point. OP doesn't want to remove a level. They already have a factor variable containing zero values for a level. – Roland May 16 '23 at 09:22