1

I have a large matrix filled with True/False values under each column. Is there a way I can summarize the matrix so that every row is unique and I have a new column with the sum of how often that row appeared.

Example:

    A B C D E
[1] T F F T F
[2] T T T F F
[3] T F F T T
[4] T T T F F
[5] T F F T F

Would become:

    A B C D E total
[1] T F F T F  2
[2] T T T F F  2
[3] T F F T F  1

EDIT

I cbind this matrix with a new column rev so I now have a data.frame that looks like

    A B C D E rev
[1] T F F T F  2
[2] T T T F F  3
[3] T F F T T  5
[4] T T T F F  2
[5] T F F T F  1

And would like a data.frame that also sums the rev column as follows:

    A B C D E rev total
[1] T F F T F  3    2
[2] T T T F F  5    2 
[3] T F F T T  5    1
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129

3 Answers3

3

An approach with dplyr :

use as.data.frame (or here as_tibble) first if you start from a matrix. In the end you need to have a data.frame anyway as you'll have both numeric and logical in your table.

mat <- matrix(
 c(T, F, F, T, F, T, T, T, F, F, T, F, F, T, T, T, T, T, F, F, T, F, F, T, F),
 ncol = 5,
 byrow = TRUE,
 dimnames = list(NULL, LETTERS[1:5])
)

library(dplyr)
mat %>%
  as_tibble %>%    # convert matrix to tibble, to be able to group
  group_by_all %>% # group by every column so we can count by group of equal values
  tally %>%        # tally will add a count column and keep distinct grouped values
  ungroup          # ungroup the table to be clean
#> # A tibble: 3 x 6
#>   A     B     C     D     E         n
#>   <lgl> <lgl> <lgl> <lgl> <lgl> <int>
#> 1 TRUE  FALSE FALSE TRUE  FALSE     2
#> 2 TRUE  FALSE FALSE TRUE  TRUE      1
#> 3 TRUE  TRUE  TRUE  FALSE FALSE     2

Created on 2018-05-29 by the reprex package (v0.2.0).

And a base solution:

df <- as.data.frame(mat)
df$n <- 1
aggregate(n~.,df,sum)
#      A     B     C     D     E n
# 1 TRUE  TRUE  TRUE FALSE FALSE 2
# 2 TRUE FALSE FALSE  TRUE FALSE 2
# 3 TRUE FALSE FALSE  TRUE  TRUE 1

Or as a one liner: aggregate(n~.,data.frame(mat,n=1),sum)

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
2

count function from plyr is exactly what you are looking for (suppose m is your matrix):

plyr::count(m)

#   x.A   x.B   x.C   x.D   x.E freq
#1 TRUE FALSE FALSE  TRUE FALSE    2
#2 TRUE FALSE FALSE  TRUE  TRUE    1
#3 TRUE  TRUE  TRUE FALSE FALSE    2
989
  • 12,579
  • 5
  • 31
  • 53
2

If you have an object mat as defined in @Moody_Mudskipper's answer, you can do

library(data.table)
dt <- as.data.table(mat)

dt[, .N, by = names(dt)]

#       A     B     C     D     E N
# 1: TRUE FALSE FALSE  TRUE FALSE 2
# 2: TRUE  TRUE  TRUE FALSE FALSE 2
# 3: TRUE FALSE FALSE  TRUE  TRUE 1

Explanation

by = <names> divides the data table into groups of rows, where the value of all the variables in <names> is equal across rows. If you do by = names(dt) it will divide into groups where all variables are equal.

.N is the number of observations in the given group of rows.

For your edit, if your data.frame is named df, you can do

setDT(df) # convert to data table
df[, .(rev = sum(rev), total = .N), by = A:E] # get desired output

#       A     B     C     D     E rev N
# 1: TRUE FALSE FALSE  TRUE FALSE   3 2
# 2: TRUE  TRUE  TRUE FALSE FALSE   5 2
# 3: TRUE FALSE FALSE  TRUE  TRUE   5 1
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38