Summarise a logical Matrix

Question

I have a large matrix filled with True/False values under each column. Is there a way I can summarize the matrix so that every row is unique and I have a new column with the sum of how often that row appeared.

Example:

    A B C D E
[1] T F F T F
[2] T T T F F
[3] T F F T T
[4] T T T F F
[5] T F F T F

Would become:

    A B C D E total
[1] T F F T F  2
[2] T T T F F  2
[3] T F F T F  1

EDIT

I cbind this matrix with a new column rev so I now have a data.frame that looks like

    A B C D E rev
[1] T F F T F  2
[2] T T T F F  3
[3] T F F T T  5
[4] T T T F F  2
[5] T F F T F  1

And would like a data.frame that also sums the rev column as follows:

    A B C D E rev total
[1] T F F T F  3    2
[2] T T T F F  5    2 
[3] T F F T T  5    1

moodymudskipper · Accepted Answer · 2018-05-29T23:01:36.697

3

An approach with dplyr :

use as.data.frame (or here as_tibble) first if you start from a matrix. In the end you need to have a data.frame anyway as you'll have both numeric and logical in your table.

mat <- matrix(
 c(T, F, F, T, F, T, T, T, F, F, T, F, F, T, T, T, T, T, F, F, T, F, F, T, F),
 ncol = 5,
 byrow = TRUE,
 dimnames = list(NULL, LETTERS[1:5])
)

library(dplyr)
mat %>%
  as_tibble %>%    # convert matrix to tibble, to be able to group
  group_by_all %>% # group by every column so we can count by group of equal values
  tally %>%        # tally will add a count column and keep distinct grouped values
  ungroup          # ungroup the table to be clean
#> # A tibble: 3 x 6
#>   A     B     C     D     E         n
#>   <lgl> <lgl> <lgl> <lgl> <lgl> <int>
#> 1 TRUE  FALSE FALSE TRUE  FALSE     2
#> 2 TRUE  FALSE FALSE TRUE  TRUE      1
#> 3 TRUE  TRUE  TRUE  FALSE FALSE     2

Created on 2018-05-29 by the reprex package (v0.2.0).

And a base solution:

df <- as.data.frame(mat)
df$n <- 1
aggregate(n~.,df,sum)
#      A     B     C     D     E n
# 1 TRUE  TRUE  TRUE FALSE FALSE 2
# 2 TRUE FALSE FALSE  TRUE FALSE 2
# 3 TRUE FALSE FALSE  TRUE  TRUE 1

Or as a one liner: aggregate(n~.,data.frame(mat,n=1),sum)

edited May 29 '18 at 23:01

answered May 29 '18 at 22:30

moodymudskipper

46,417
11
121
167

Are you able to break down what exactly is happening in the code? I haven't used the dplyr package before. – Jamie Allan May 29 '18 at 22:39
1

I added comments, to understand `%>%` the keywords are pipe, dplyr, and magrittr. – moodymudskipper May 29 '18 at 22:46
1

@Calum tally doesn't do anything different from count here – moodymudskipper May 29 '18 at 22:47
1

The base solution is what I'm going with. It easier for me to understand and I can apply the same line to different columns. Thank you. – Jamie Allan May 29 '18 at 22:55
3

The `[[<-` is a bit of overkill, you could just do `aggregate(n ~ ., data.frame(mat,n=1), FUN=sum)` – thelatemail May 29 '18 at 22:58
good point, edited – moodymudskipper May 29 '18 at 23:01
@Moody_Mudskipper yeah i know, it's just to reflect the intended use of `tally` vs `count` – Calum You May 29 '18 at 23:11

score 2 · Answer 2 · answered May 29 '18 at 22:35

count function from plyr is exactly what you are looking for (suppose m is your matrix):

plyr::count(m)

#   x.A   x.B   x.C   x.D   x.E freq
#1 TRUE FALSE FALSE  TRUE FALSE    2
#2 TRUE FALSE FALSE  TRUE  TRUE    1
#3 TRUE  TRUE  TRUE FALSE FALSE    2

IceCreamToucan · Answer 3 · 2018-05-31T00:45:07.627

If you have an object mat as defined in @Moody_Mudskipper's answer, you can do

library(data.table)
dt <- as.data.table(mat)

dt[, .N, by = names(dt)]

#       A     B     C     D     E N
# 1: TRUE FALSE FALSE  TRUE FALSE 2
# 2: TRUE  TRUE  TRUE FALSE FALSE 2
# 3: TRUE FALSE FALSE  TRUE  TRUE 1

Explanation

by = <names> divides the data table into groups of rows, where the value of all the variables in <names> is equal across rows. If you do by = names(dt) it will divide into groups where all variables are equal.

.N is the number of observations in the given group of rows.

For your edit, if your data.frame is named df, you can do

setDT(df) # convert to data table
df[, .(rev = sum(rev), total = .N), by = A:E] # get desired output

#       A     B     C     D     E rev N
# 1: TRUE FALSE FALSE  TRUE FALSE   3 2
# 2: TRUE  TRUE  TRUE FALSE FALSE   5 2
# 3: TRUE FALSE FALSE  TRUE  TRUE   5 1

Summarise a logical Matrix

EDIT

3 Answers3