1

I have a large data set which contains two columns. This is a representation of it l1=data.frame(c1=c("A","A","A","B","B","C","D","D"),c2=c("cat","dog","cow","pig","dog","horse","cat","goat"))

I need to create a frequency matrix with c1 as the columns and c2 as the rows and the number of occurrences of each value of c2 in c1 in each cell.

The output should look something like this

       c1
c2      A B C D
  cat   1 0 0 1
  cow   1 0 0 0
  dog   1 1 0 0
  goat  0 0 0 1
  horse 0 0 1 0
  pig   0 1 0 0

I tried using table() and xtabs(). They work for this smaller dataset but not on my actual very large one. Also a solution without for loops would be helpful as it is a very large dataset. Thanks!

Phil
  • 7,287
  • 3
  • 36
  • 66
jerry1
  • 45
  • 3
  • Does this answer your question? [Faster ways to calculate frequencies and cast from long to wide](https://stackoverflow.com/questions/8186133/faster-ways-to-calculate-frequencies-and-cast-from-long-to-wide) – jblood94 Oct 27 '22 at 12:42
  • With `data.table`: `dcast(setDT(l1), c1~c2, length)` – jblood94 Oct 27 '22 at 12:44

1 Answers1

0
library(tidyverse)

l1 %>% 
  group_by(c1) %>% 
  count(c2) %>%  
  pivot_wider(names_from = c1, values_from = n, names_sort = TRUE) 

# A tibble: 6 x 5
  c2        A     B     C     D
  <chr> <int> <int> <int> <int>
1 cat       1    NA    NA     1
2 cow       1    NA    NA    NA
3 dog       1     1    NA    NA
4 pig      NA     1    NA    NA
5 horse    NA    NA     1    NA
6 goat     NA    NA    NA     1

Without NAs

l1 %>% 
  group_by(c1) %>% 
  count(c2) %>%  
  pivot_wider(names_from = c1, values_from = n, names_sort = TRUE) %>% 
  replace(is.na(.), 0)

# A tibble: 6 x 5
  c2        A     B     C     D
  <chr> <int> <int> <int> <int>
1 cat       1     0     0     1
2 cow       1     0     0     0
3 dog       1     1     0     0
4 pig       0     1     0     0
5 horse     0     0     1     0
6 goat      0     0     0     1
Chamkrai
  • 5,912
  • 1
  • 4
  • 14
  • Hi, Thanks for your answer. I tried your code but i got the following error. Error: ! Assigned data `values` must be compatible with existing data. ℹ Error occurred for column `c2`. ✖ Can't convert to . Run `rlang::last_error()` to see where the error occurred – jerry1 Oct 27 '22 at 15:27