0

I've got a data frame in R that looks like this:

df1 <- data.frame(id = letters[seq( from = 1, to = 20 )],
                  var1 = sample(1:5,20,replace=T),
                  var2 = sample(1:5,20,replace=T))

Herevar1 and var2 are categorical variables between 1 and 5.

I want to create a matrix where the column headings are var1 1 to 5, the row headings are var2 1 to 5, and the matrix is populated by the count of the numbers of entries that belong to each group.

I've tried using reshape, looked at the lazyeval package, and found questions that are similar like this, but I can't get any to fit the bill.

Community
  • 1
  • 1
tktk234
  • 426
  • 3
  • 12

2 Answers2

1

using dcast from reshape2

df1 %>% reshape2::dcast(var2~var1)
reshape2::dcast(df1,var2 ~ var1)
Vincent Guyader
  • 2,927
  • 1
  • 26
  • 43
1
library(dplyr)
library(tidyr)

set.seed(1)

df1 <- data.frame(id = letters[seq( from = 1, to = 20 )],
                  var1 = sample(1:5,20,replace=T),
                  var2 = sample(1:5,20,replace=T))

df1

#    id var1 var2
# 1   a    2    5
# 2   b    2    2
# 3   c    3    4
# 4   d    5    1
# 5   e    2    2
# 6   f    5    2
# 7   g    5    1
# 8   h    4    2
# 9   i    4    5
# 10  j    1    2
# 11  k    2    3
# 12  l    1    3
# 13  m    4    3
# 14  n    2    1
# 15  o    4    5
# 16  p    3    4
# 17  q    4    4
# 18  r    5    1
# 19  s    2    4
# 20  t    4    3


df1 %>%
  count(var1,var2) %>%                     # count how many times you have each combination
  ungroup %>%
  mutate(var1 = paste0("var1_",var1)) %>%  # update variable values
  spread(var1,n, fill=0) %>%               # reshape dataset
  mutate(var2 = paste0("var2_",var2)) %>%  # update variable values
  print() -> df2

# # A tibble: 5 × 6
#     var2 var1_1 var1_2 var1_3 var1_4 var1_5
#    <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 var2_1      0      1      0      0      3
# 2 var2_2      1      2      0      1      1
# 3 var2_3      1      1      0      2      0
# 4 var2_4      0      1      2      1      0
# 5 var2_5      0      1      0      2      0

If you really prefer having var2 values as row names and not as a column add this

df2 = data.frame(df2)
row.names(df2) = df2$var2
df2$var2 = NULL

df2

#        var1_1 var1_2 var1_3 var1_4 var1_5
# var2_1      0      1      0      0      3
# var2_2      1      2      0      1      1
# var2_3      1      1      0      2      0
# var2_4      0      1      2      1      0
# var2_5      0      1      0      2      0
AntoniosK
  • 15,991
  • 2
  • 19
  • 32