Count co-occurrence of values and create data frame of count

Question

I have a dataframe that looks like this:

x <- c(1,2,1,1,4,NA,NA,NA,NA,NA)
y <- c(21,22,23,21,21,NA,NA,NA,NA,NA)
z <- c(NA,NA,NA,NA,NA,1,2,3,4,5)
dat <- data.frame(x,y,z)

I want to count how many times a value from x occurs in z and then take the value of y that corresponds to that row of x. I'm assuming I'll need to use a for loop or apply. The counts would populate a data frame or matrix with column headings corresponding to the y values. I've looked through Stack Overflow and I've tried using which and %in% with no luck. I've also tried sum but it only looks across a row and sums matching occurrences. I want to look across all rows of the other column of interest (z).

The result would look like this (z as the row labels, y as column labels):

  21  22  23
1  2   0   1
2  0   1   0
3  0   0   0
4  1   0   0
5  0   0   0

Maurits Evers · Answer 1 · 2018-03-11T06:47:41.830

3

You can use base R's table:

dat$x <- factor(dat$x, levels = unique(dat$z));
as.data.frame.matrix(table(dat[, 1:2]));
#  21 22 23
#1  2  0  1
#2  0  1  0
#3  0  0  0
#4  1  0  0
#5  0  0  0

Explanation: Turn column entries x into a factor with levels from z, then use table to count occurrences of entries y per x. The factor(...) command ensures that all possible values from z are considered in the table.

edited Mar 11 '18 at 06:47

answered Mar 11 '18 at 06:36

Maurits Evers

49,617
4
47
68

Crisp and tricky answer! – MKR Mar 11 '18 at 07:30
1

Thanks @MKR! This is one of those cases where the `NA`s work with you not against in keeping it concise:-) – Maurits Evers Mar 11 '18 at 08:39

score 0 · Answer 2 · answered Mar 11 '18 at 07:37

Here is an option using tidyverse where we mutate 'x' with levels specified as the unique values of 'z', then get the count of ' and 'y', remove the NA elements (na.omit) and spread from 'long' to 'wide'

library(dplyr)
library(tidyr)
dat %>%
    mutate(x = factor(x, levels = unique(z))) %>%
    count(x, y) %>% 
    na.omit %>%
    spread(y, n, fill = 0, drop = FALSE) %>%
    select(-x)
# A tibble: 5 x 3
#   `21`  `22`  `23`
#* <dbl> <dbl> <dbl>
#1  2.00  0     1.00
#2  0     1.00  0   
#3  0     0     0   
#4  1.00  0     0   
#5  0     0     0

score 0 · Answer 3 · answered Mar 14 '18 at 16:48

0

A colleague of mine also provided the following solution:

library(tidyr)
dat$x <- factor(dat$x, levels = c("1", "2", "3", "4"))
reshape2::dcast(dat, x ~ y, fun.aggregate = length, drop = FALSE)

answered Mar 14 '18 at 16:48

b.ram

21
2

Count co-occurrence of values and create data frame of count

3 Answers3