2

I have a dataframe that looks like this:

x <- c(1,2,1,1,4,NA,NA,NA,NA,NA)
y <- c(21,22,23,21,21,NA,NA,NA,NA,NA)
z <- c(NA,NA,NA,NA,NA,1,2,3,4,5)
dat <- data.frame(x,y,z)   

I want to count how many times a value from x occurs in z and then take the value of y that corresponds to that row of x. I'm assuming I'll need to use a for loop or apply. The counts would populate a data frame or matrix with column headings corresponding to the y values. I've looked through Stack Overflow and I've tried using which and %in% with no luck. I've also tried sum but it only looks across a row and sums matching occurrences. I want to look across all rows of the other column of interest (z).

The result would look like this (z as the row labels, y as column labels):

  21  22  23
1  2   0   1
2  0   1   0
3  0   0   0
4  1   0   0
5  0   0   0
b.ram
  • 21
  • 2

3 Answers3

3

You can use base R's table:

dat$x <- factor(dat$x, levels = unique(dat$z));
as.data.frame.matrix(table(dat[, 1:2]));
#  21 22 23
#1  2  0  1
#2  0  1  0
#3  0  0  0
#4  1  0  0
#5  0  0  0

Explanation: Turn column entries x into a factor with levels from z, then use table to count occurrences of entries y per x. The factor(...) command ensures that all possible values from z are considered in the table.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
0

Here is an option using tidyverse where we mutate 'x' with levels specified as the unique values of 'z', then get the count of ' and 'y', remove the NA elements (na.omit) and spread from 'long' to 'wide'

library(dplyr)
library(tidyr)
dat %>%
    mutate(x = factor(x, levels = unique(z))) %>%
    count(x, y) %>% 
    na.omit %>%
    spread(y, n, fill = 0, drop = FALSE) %>%
    select(-x)
# A tibble: 5 x 3
#   `21`  `22`  `23`
#* <dbl> <dbl> <dbl>
#1  2.00  0     1.00
#2  0     1.00  0   
#3  0     0     0   
#4  1.00  0     0   
#5  0     0     0   
akrun
  • 874,273
  • 37
  • 540
  • 662
0

A colleague of mine also provided the following solution:

library(tidyr)
dat$x <- factor(dat$x, levels = c("1", "2", "3", "4"))
reshape2::dcast(dat, x ~ y, fun.aggregate = length, drop = FALSE)
b.ram
  • 21
  • 2