5

I have a matrix (10 x 100) where I need to count the number of each integer per column so I have a final matrix that is (3 x 100). Counts for 0, 1, and 2 per column.

I think the apply function will be useful here, the code I provided is a solution I envision.

Any help will be greatly appreciated.

library(dplyr)
set.seed(100)
a <- matrix(sample(0:2, size=100, replace=TRUE), nrow=10, ncol=100)
out <- apply(a, 2, function(x) count(x))

 Desired output: rows are the sum of each variable "0, 1, 2"

   1 2 3 ...  n
 0 1 1 3
 1 6 3 3
 2 3 6 4
Connor Murray
  • 313
  • 3
  • 12

2 Answers2

7

There is a function called table that counts the distinct values of the whole object. You can apply it to each column, i.e.

apply(a, 2, table)

to include NAs in the count just use the option useNA, i.e.

apply(a, 2, table, useNA = 'always')

#or with complete syntax
apply(a, 2, function(i)table(i, useNA = 'always'))

As @IceCreamToucan mentions in comments, If you have missing values in any column, then you want be able to coerce to to a data frame (or matrix for that matter). To overcome this, we can convert each column to factor with levels = c(0:2), i.e.

apply(a, 2, function(i) table(factor(i, levels = c(0:2)), useNA = 'always'))
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • My actual matrix is huge (100 x 3,000,000) and it contains NA's in some of the rows. Table has a feature called useNA but I am having trouble using it properly, do you have any advice? as <- as.data.table(apply(a, 2, table(useNA="always"))) Error in table(useNA = "always") : nothing to tabulate – Connor Murray Oct 29 '19 at 13:15
  • You just made a syntax mistake. Try `apply(a, 2, table, useNA = 'always')` – Sotos Oct 29 '19 at 13:18
  • 1
    If some columns don't contain all three integers, attempting to make this a data.frame with `as.data.frame(apply(a, 2, table))` will result in `Error ... arguments imply differing number of rows: 3, 2` (or 3, 1 depending on the number of integers present) – IceCreamToucan Oct 29 '19 at 13:27
  • What do you mean? What would be a case? – Sotos Oct 29 '19 at 13:29
  • 1
    This example is such a case. Some columns don't have all three integers, so the table only has two elements for that column, and `as.data.frame` produces an error. If you're using `as.data.table` you get a warning and it recycles the short outputs. – IceCreamToucan Oct 29 '19 at 13:29
  • 1
    @IceCreamToucan Edited. Give it a try now – Sotos Oct 29 '19 at 13:40
  • 2
    Yep, that works as expected. I'm guessing I ran into the [R version set.seed difference](https://stackoverflow.com/questions/47199415/is-set-seed-consistent-over-different-versions-of-r-and-ubuntu) wrt this example. Time to finally upgrade this machine I guess. – IceCreamToucan Oct 29 '19 at 13:42
  • @IceCreamToucan ahhh...makes sense. I had no idea such case existed – Sotos Oct 29 '19 at 13:48
1

This produces a data.frame and ensures that for each input column the output column includes all three integers (even if the input column does not). Note: colSums also has a na.rm argument.

data.frame(
  int_counted = 0:2, 
  do.call(rbind, lapply(0:2, function(x) colSums(a == x)))
)

Output

#   int_counted X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22
# 1           0  4  2  2  2  3  8  1  2  3   3   4   2   2   2   3   8   1   2   3   3   4   2
# 2           1  5  4  4  3  2  2  6  4  3   4   5   4   4   3   2   2   6   4   3   4   5   4
# 3           2  1  4  4  5  5  0  3  4  4   3   1   4   4   5   5   0   3   4   4   3   1   4
#   X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45
# 1   2   2   3   8   1   2   3   3   4   2   2   2   3   8   1   2   3   3   4   2   2   2   3
# 2   4   3   2   2   6   4   3   4   5   4   4   3   2   2   6   4   3   4   5   4   4   3   2
# 3   4   5   5   0   3   4   4   3   1   4   4   5   5   0   3   4   4   3   1   4   4   5   5
#   X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60 X61 X62 X63 X64 X65 X66 X67 X68
# 1   8   1   2   3   3   4   2   2   2   3   8   1   2   3   3   4   2   2   2   3   8   1   2
# 2   2   6   4   3   4   5   4   4   3   2   2   6   4   3   4   5   4   4   3   2   2   6   4
# 3   0   3   4   4   3   1   4   4   5   5   0   3   4   4   3   1   4   4   5   5   0   3   4
#   X69 X70 X71 X72 X73 X74 X75 X76 X77 X78 X79 X80 X81 X82 X83 X84 X85 X86 X87 X88 X89 X90 X91
# 1   3   3   4   2   2   2   3   8   1   2   3   3   4   2   2   2   3   8   1   2   3   3   4
# 2   3   4   5   4   4   3   2   2   6   4   3   4   5   4   4   3   2   2   6   4   3   4   5
# 3   4   3   1   4   4   5   5   0   3   4   4   3   1   4   4   5   5   0   3   4   4   3   1
#   X92 X93 X94 X95 X96 X97 X98 X99 X100
# 1   2   2   2   3   8   1   2   3    3
# 2   4   4   3   2   2   6   4   3    4
# 3   4   4   5   5   0   3   4   4    3
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38