1

I have the following sample data frame:

df <- data.frame("Alpha" = c(NA, NA, 6, 5, 4, 6, 5, 3), "Beta" = c(3, 3, 4, 2, 6, NA, NA, NA), "Gamma" =c(6, 2, 3, 1, NA, NA, 5, 4))

From this data, I would like to get a count of all values between 0 and 6 for each column. The data frame does not contain all values between 0 and 6, so the final output would look something like this:

result <- data.frame("value"=c(0, 1, 2, 3, 4, 5, 6),
                     "Alpha"=c(0, 0, 0, 1, 1, 2, 2),
                     "Beta"=c(0, 0, 1, 2, 1, 0, 1),
                     "Gamma"=c(0, 1, 1, 1, 1, 1, 1))

value       Alpha     Beta     Gamma
    0           0        0         0
    1           0        0         1
    2           0        1         1
    3           1        2         1
    4           1        1         1
    5           2        0         1
    6           2        1         1         

My first inclination was to reiterate the distinct() function in dplyr. I was thinking of using something like this:

df.alpha <- df %>% distinct(Alpha)
df.beta <- df %>% distinct(Beta)
df.gamma <- df %>% distinct(Gamma)

Afterward, I would bind them together. However, I encounter three issues:

  • There's a lot of copy and pasting here (there are more columns in my real data frame)
  • The results do not have the same length, which makes binding difficult; and
  • "0" is not a value in the original table, so it does not get counted in the results.

I found a similar question in this stackoverflow post on counting a specific value in multiple columns at once. However, unlike that post, the issue I have here is that there is no variable to "group by".

Do folks have any suggestions on how I can produce a count of values between a range of integers for all columns? Thanks so much!

dkwong
  • 13
  • 4

3 Answers3

1

Maybe something like this:

> df[] <- lapply(df,function(x) factor(x,levels = 0:6))
> data.frame(lapply(df,tabulate))
  Alpha Beta Gamma
1     0    0     0
2     0    0     1
3     0    1     1
4     1    2     1
5     1    1     1
6     2    0     1
7     2    1     1
joran
  • 169,992
  • 32
  • 429
  • 468
1

A one-liner similar to joran's answer is

cbind.data.frame(values=0:6, sapply(df, function(x) table(factor(x, levels=0:6))))

this returns

  values Alpha Beta Gamma
0      0     0    0     0
1      1     0    0     1
2      2     0    1     1
3      3     1    2     1
4      4     1    1     1
5      5     2    0     1
6      6     2    1     1

Replacing table with the tabulate function should speed up the result and also simplify the output.

lmo
  • 37,904
  • 9
  • 56
  • 69
1

Another idea with tidyverse:

library(dplyr)
library(purrr)

df %>%
  mutate_all(factor, levels = 0:6) %>%
  map_dfc(~ c(table(.))) %>%
  cbind(values = 0:6, .)

Result:

  values Alpha Beta Gamma
1      0     0    0     0
2      1     0    0     1
3      2     0    1     1
4      3     1    2     1
5      4     1    1     1
6      5     2    0     1
7      6     2    1     1
acylam
  • 18,231
  • 5
  • 36
  • 45