1

I am trying to solve the following problem on R.

I have a data.framethat looks like this (obviously way bigger):

Column_1     Column_2     Column_3
(0-1]        (15-25]      58
(2-3]        (35-45]      25
(4-5]        (35-45]      50
(0-1]        (15-25]      5
(2-3]        (25-35]      10
(1-2]        (25-35]      15
(1-2]        (15-25]      12
(3-4]        (25-35]      10
(4-5]        (35-45]      9

The goal is to construct a matrix from this data.frame having Column_1 as column names, Column_2 as row names and and inside the matrix have the mean of each value present in Column_3 associated with the respective value in Column_1 and Column_2.

The resulting matrix should be something like this:

      (15-25]    (25-35]     (35-45]
(0-1]   31.5      0             0
(1-2]   12        15            0
(2-3]   0         10            25     
(3-4]   0         10            0
(4-5]   0         0             29.5

How can I make it?

A.C.
  • 13
  • 4
  • Possible duplicate of [R create table from 3 columns](https://stackoverflow.com/questions/40293982/r-create-table-from-3-columns) – storaged Sep 12 '18 at 19:58
  • Possible duplicate of [Reshape three column data frame to matrix ("long" to "wide" format)](https://stackoverflow.com/questions/9617348/reshape-three-column-data-frame-to-matrix-long-to-wide-format) – divibisan Sep 14 '18 at 21:19

2 Answers2

1

xtabs() and aggregate() do the job:

as.data.frame.matrix(xtabs(Column_3 ~ Column_1 + Column_2,
                           aggregate(Column_3 ~ Column_1 + Column_2, df, mean)))
# output
      (15-25] (25-35] (35-45]
(0-1]    31.5       0     0.0
(1-2]    12.0      15     0.0
(2-3]     0.0      10    25.0
(3-4]     0.0      10     0.0
(4-5]     0.0       0    29.5

# data
df <- structure(list(Column_1 = c("(0-1]", "(2-3]", "(4-5]", "(0-1]", 
"(2-3]", "(1-2]", "(1-2]", "(3-4]", "(4-5]"), Column_2 = c("(15-25]", 
"(35-45]", "(35-45]", "(15-25]", "(25-35]", "(25-35]", "(15-25]", 
"(25-35]", "(35-45]"), Column_3 = c(58L, 25L, 50L, 5L, 10L, 15L, 
12L, 10L, 9L)), .Names = c("Column_1", "Column_2", "Column_3"
), class = "data.frame", row.names = c(NA, -9L))
nghauran
  • 6,648
  • 2
  • 20
  • 29
1

We can use dcast from reshape2. Calling your data dd:

wide = reshape2::dcast(data = dd, Column_1 ~ Column_2, fun.aggregate = mean, fill = 0)
wide
#   Column_1 (15-25] (25-35] (35-45]
# 1    (0-1]    31.5       0     0.0
# 2    (1-2]    12.0      15     0.0
# 3    (2-3]     0.0      10    25.0
# 4    (3-4]     0.0      10     0.0
# 5    (4-5]     0.0       0    29.5

That's a data frame, we can of course convert to matrix:

mat = as.matrix(wide[, -1])
row.names(mat) = wide[, 1]
mat
#       (15-25] (25-35] (35-45]
# (0-1]    31.5       0     0.0
# (1-2]    12.0      15     0.0
# (2-3]     0.0      10    25.0
# (3-4]     0.0      10     0.0
# (4-5]     0.0       0    29.5
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294