7

I'm trying to use the daply function in the plyr package but I cannot get it to output properly. Even though the variable that makes up the matrix is numeric, the elements of the matrix are lists, not the variable itself. Here is a small subset of the data for example sake:

   Month Vehicle Samples
1 Oct-10   31057     256
2 Oct-10   31059     316
3 Oct-10   31060     348
4 Nov-10   31057     267
5 Nov-10   31059     293
6 Nov-10   31060     250
7 Dec-10   31057     159
8 Dec-10   31059     268
9 Dec-10   31060     206

And I would like to be able to visualize the data in a matrix format, which would look something like this:

  Month
Vehicle Oct-10 Nov-10 Dec-10
  31057    256    267    159
  31059    316    293    268
  31060    348    250    206

Here are a couple of alternative syntax that I use (the latter because my original dataframe has more columns than I show here):

daply(DF, .(Vehicle, Month), identity)
daply(DF,.(Vehicle,Month), colwise(identity,.(Samples)))

However what I get instead is rather abstruse:

       Month
Vehicle Oct-10 Nov-10 Dec-10
  31057 List,3 List,3 List,3
  31059 List,3 List,3 List,3
  31060 List,3 List,3 List,3

I used the str function on the output as some commenters have suggested, and here is an excerpt:

List of 9
 $ :'data.frame':       1 obs. of  3 variables:
  ..$ Month  : Ord.factor w/ 3 levels "Oct-10"<"Nov-10"<..: 1
  ..$ Vehicle: Factor w/ 3 levels "31057","31059",..: 1
  ..$ Samples: int 256
 $ :'data.frame':       1 obs. of  3 variables:
  ..$ Month  : Ord.factor w/ 3 levels "Oct-10"<"Nov-10"<..: 1
  ..$ Vehicle: Factor w/ 3 levels "31057","31059",..: 2
  ..$ Samples: int 316

What am I missing? Also, is there a way to do this simply with the base packages? Thanks!

Below is the Dput of the data frame if you'd like to reproduce this:

structure(list(Month = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L), .Label = c("Oct-10", "Nov-10", "Dec-10"), class = c("ordered", 
"factor")), Vehicle = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L), .Label = c("31057", "31059", "31060"), class = "factor"), 
    Samples = c(256L, 316L, 348L, 267L, 293L, 250L, 159L, 268L, 
    206L)), .Names = c("Month", "Vehicle", "Samples"), class = "data.frame", row.names = c(NA, 
9L))
JD Margulici
  • 965
  • 7
  • 8
  • 2
    A bit more information would be useful. Try str(DF) and paste the output in the question. Or use dput(DF) to provide people with your data to work with if it's not to big (subset it down if it is). – nzcoops Aug 10 '11 at 05:27
  • 1
    It's not obvious what you are trying to do here. It seems that you are attempting some kind of reshape of the data, since `identity` doesn't perform any operations on its arguments. Please tell us what your expected results are. – Andrie Aug 10 '11 at 07:00
  • 1
    See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example on how to make your code in your question reproducible. – Roman Luštrik Aug 10 '11 at 08:26
  • Thanks for your edits; the question is now much better! This is indeed called reshaping the data; searching using this term (and the R tag) give several results that should be helpful to you: http://stackoverflow.com/search?q=%5Br%5D+reshape I've also answered your question below to specifically say why `identity` didn't work. – Aaron left Stack Overflow Aug 11 '11 at 02:18
  • http://stackoverflow.com/a/9617424/210673 now has a list of the various ways to do this. – Aaron left Stack Overflow Mar 23 '12 at 16:06

2 Answers2

8

The identity function isn't what you want here; from the help page, "All plyr functions use the same split-apply-combine strategy: they split the input into simpler pieces, apply .fun to each piece, and then combine the pieces into a single data structure." The simpler pieces in this case are subsets of the original data frame with unique Vehicle/Month combinations; the identity function just returns that subset, and these subsets are then used to fill the resulting matrix.

That is, each element of the matrix you got is a data frame (which is a type of list) with the rows with that Month/Vehicle combination.

> try1 <- daply(DF, .(Vehicle, Month), identity)
> try1[1,1]
[[1]]
   Month Vehicle Samples
1 Oct-10   31057     256

You instead want to use a function that just gets the Samples portion of that data frame, like this:

daply(DF, .(Vehicle, Month), function(x) x$Samples)

which results in

       Month
Vehicle Oct-10 Nov-10 Dec-10
  31057    256    267    159
  31059    316    293    268
  31060    348    250    206

A few alternate ways of doing this are with cast from the reshape package (which returns a data frame)

cast(DF, Vehicle~Month, value="Samples")

the revised version in reshape2; the first returns a data frame, the second a matrix

dcast(DF, Vehicle~Month, value_var="Samples")
acast(DF, Vehicle~Month, value_var="Samples")

with xtabs from the stats package

xtabs(Samples ~ Vehicle + Month, DF)

or by hand, which isn't hard at all using matrix indexing; almost all the code is just setting up the matrix.

with(DF, {
  out <- matrix(nrow=nlevels(Vehicle), ncol=nlevels(Month),
                dimnames=list(Vehicle=levels(Vehicle), Month=levels(Month)))
  out[cbind(Vehicle, Month)] <- Samples
  out
})

The reshape function in the stats package can also be used to do this, but the syntax is difficult and I haven't used it once since learning cast and melt from the reshape package.

Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • Thanks, even more helpful than I could have wished! It also helped me get over a misconception about the d*ply function, i.e. it first creates subsets that are data frames. – JD Margulici Aug 11 '11 at 15:48
2

If we take the OP at their word(s) in the title, then they may be looking for data.matrix() which is a standard function in the base package that is always available in R.

data.matrix() works by converting any factors to their numeric coding before converting the data frame to a matrix. Consider the following data frame:

dat <- data.frame(A = 1:10, B = factor(sample(c("X","Y"), 10, replace = TRUE)))

If we convert via as.matrix() we get a character matrix:

> head(as.matrix(dat))
     A    B  
[1,] " 1" "X"
[2,] " 2" "X"
[3,] " 3" "Y"
[4,] " 4" "Y"
[5,] " 5" "Y"
[6,] " 6" "Y"

or if via matrix() one gets a list with dimensions (a list array - as mentioned in the Value section of ?daply by the way)

> head(matrix(dat))
     [,1]      
[1,] Integer,10
[2,] factor,10 
> str(matrix(dat))
List of 2
 $ : int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ : Factor w/ 2 levels "X","Y": 1 1 2 2 2 2 1 2 2 1
 - attr(*, "dim")= int [1:2] 2 1

data.matrix(), however, does the intended thing:

> mat <- data.matrix(dat)
> head(mat)
     A B
[1,] 1 1
[2,] 2 1
[3,] 3 2
[4,] 4 2
[5,] 5 2
[6,] 6 2
> str(mat)
 int [1:10, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "A" "B"
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453