2

Must one melt a data frame prior to having it cast? From ?melt:

data    molten data frame, see melt.

In other words, is it absolutely necessary to have a data frame molten prior to any acast or dcast operation?

Consider the following:

library("reshape2")
library("MASS")

xb <- dcast(Cars93, Manufacturer ~ Type, mean, value.var="Price")
m.Cars93 <- melt(Cars93, id.vars=c("Manufacturer", "Type"), measure.vars="Price")
xc <- dcast(m.Cars93, Manufacturer ~ Type, mean, value.var="value")

Then:

> identical(xb, xc)
[1] TRUE

So in this case the melt operation seems to have been redundant.

What are the general guiding rules in these cases? How do you decide when a data frame needs to be molten prior to a *cast operation?

landroni
  • 2,902
  • 1
  • 32
  • 39
  • 2
    As long as the existing data is already in long-format, I don't see any general need to melt it before casting. – talat Aug 05 '14 at 11:21

1 Answers1

5

Whether or not you need to melt your dataset depends on what form you want the final data to be in and how that relates to what you currently have.

The way I generally think of it is:

  1. For the LHS of the formula, I should have one or more columns that will become my "id" rows. These will remain as separate columns in the final output.
  2. For the RHS of the formula, I should have one or more columns that combine to form new columns in which I will be "spreading" my values out across. When this is more than one column, dcast will create new columns based on the combination of the values.
  3. I must have just one column that would feed the values to fill in the resulting "grid" created by these rows and columns.

To illustrate with a small example, consider this tiny dataset:

mydf <- data.frame(
  A = c("A", "A", "B", "B", "B"),
  B = c("a", "b", "a", "b", "c"),
  C = c(1, 1, 2, 2, 3),
  D = c(1, 2, 3, 4, 5),
  E = c(6, 7, 8, 9, 10)
)

Imagine that our possible value variables are columns "D" or "E", but we are only interested in the values from "E". Imagine also that our primary "id" is column "A", and we want to spread the values out according to column "B". Column "C" is irrelevant at this point.

With that scenario, we would not need to melt the data first. We could simply do:

library(reshape2)
dcast(mydf, A ~ B, value.var = "E")
#   A a b  c
# 1 A 6 7 NA
# 2 B 8 9 10

Compare what happens when you do the following, keeping in mind my three points above:

dcast(mydf, A ~ C, value.var = "E")
dcast(mydf, A ~ B + C, value.var = "E")
dcast(mydf, A + B ~ C, value.var = "E")

When is melt required?

Now, let's make one small adjustment to the scenario: We want to spread out the values from both columns "D" and "E" with no actual aggregation taking place. With this change, we need to melt the data first so that the relevant values that need to be spread out are in a single column (point 3 above).

dfL <- melt(mydf, measure.vars = c("D", "E"))
dcast(dfL, A ~ B + variable, value.var = "value")
#   A a_D a_E b_D b_E c_D c_E
# 1 A   1   6   2   7  NA  NA
# 2 B   3   8   4   9   5  10
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Thank you so much for the thorough explanation. If I understand correctly, the last `melt` operation assumes that `c("D", "E")` are identical in nature, i.e. they're exactly the same measurement spread across groups and registered in two different columns. – landroni Aug 06 '14 at 20:58
  • @landroni, they should not necessarily be the same measurement, but at least the same data type (for example, all numeric, and not logical, character, or otherwise), on which you would want to do the same type of aggregation. The [discussion that Arun and I were having](http://stackoverflow.com/q/25143428/1270695) on your other question might shed some more light on this topic. – A5C1D2H2I1M1N2O1R2T1 Aug 07 '14 at 03:42
  • Indeed, the measurements can be different (but should be of same data type), although it seems to me that this example points to a certain fundamental rigidity in the way `acast`/`dcast` operate, if you intend to use them for pivot tables. – landroni Aug 07 '14 at 08:22