pivot and count membership of two variables

Question

I've got a data frame in R that looks like this:

df1 <- data.frame(id = letters[seq( from = 1, to = 20 )],
                  var1 = sample(1:5,20,replace=T),
                  var2 = sample(1:5,20,replace=T))

Herevar1 and var2 are categorical variables between 1 and 5.

I want to create a matrix where the column headings are var1 1 to 5, the row headings are var2 1 to 5, and the matrix is populated by the count of the numbers of entries that belong to each group.

I've tried using reshape, looked at the lazyeval package, and found questions that are similar like this, but I can't get any to fit the bill.

Please add your expected output. Also, use `set.seed` for reproducibility. — Ronak Shah, Feb 19 '17 at 11:40

score 1 · Accepted Answer · answered Feb 19 '17 at 11:47

1

using dcast from reshape2

df1 %>% reshape2::dcast(var2~var1)
reshape2::dcast(df1,var2 ~ var1)

answered Feb 19 '17 at 11:47

Vincent Guyader

2,927
1
26
43

score 1 · Answer 2 · answered Feb 19 '17 at 11:57

library(dplyr)
library(tidyr)

set.seed(1)

df1 <- data.frame(id = letters[seq( from = 1, to = 20 )],
                  var1 = sample(1:5,20,replace=T),
                  var2 = sample(1:5,20,replace=T))

df1

#    id var1 var2
# 1   a    2    5
# 2   b    2    2
# 3   c    3    4
# 4   d    5    1
# 5   e    2    2
# 6   f    5    2
# 7   g    5    1
# 8   h    4    2
# 9   i    4    5
# 10  j    1    2
# 11  k    2    3
# 12  l    1    3
# 13  m    4    3
# 14  n    2    1
# 15  o    4    5
# 16  p    3    4
# 17  q    4    4
# 18  r    5    1
# 19  s    2    4
# 20  t    4    3


df1 %>%
  count(var1,var2) %>%                     # count how many times you have each combination
  ungroup %>%
  mutate(var1 = paste0("var1_",var1)) %>%  # update variable values
  spread(var1,n, fill=0) %>%               # reshape dataset
  mutate(var2 = paste0("var2_",var2)) %>%  # update variable values
  print() -> df2

# # A tibble: 5 × 6
#     var2 var1_1 var1_2 var1_3 var1_4 var1_5
#    <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 var2_1      0      1      0      0      3
# 2 var2_2      1      2      0      1      1
# 3 var2_3      1      1      0      2      0
# 4 var2_4      0      1      2      1      0
# 5 var2_5      0      1      0      2      0

If you really prefer having var2 values as row names and not as a column add this

df2 = data.frame(df2)
row.names(df2) = df2$var2
df2$var2 = NULL

df2

#        var1_1 var1_2 var1_3 var1_4 var1_5
# var2_1      0      1      0      0      3
# var2_2      1      2      0      1      1
# var2_3      1      1      0      2      0
# var2_4      0      1      2      1      0
# var2_5      0      1      0      2      0

pivot and count membership of two variables

2 Answers2