when is plyr better than data.table?

Question

Better here can mean faster or easier to read/shorter syntax or it could also mean that the command is not even doable in data.table.

I don't use plyr a lot and would like to know if there are cases when I should. Because I don't use it a lot, the only example I can come up with is rbind.fill that to my knowledge doesn't have a data.table analog and every other example I've seen of smth being done in both plyr and data.table, the latter was faster and easier to read/more compact.

**plyr** will not (in general) be faster than **data.table**. Some people (like myself) find the former's syntax far more intuitive and readable than the latter. But that is merely a subjective choice. — joran, Apr 22 '13 at 18:30
@Arun thx, I'll take a look at those functions. Does `plyr` do anything for `data.frame`'s better? — eddi, Apr 22 '13 at 18:40
@Arun, cool thanks. The parallel stuff sounds interesting, I'll take a look at it. — eddi, Apr 22 '13 at 19:10
Just my 2ct, for multidimensional array's plain `array` is much faster that `aaply`. — Paul Hiemstra, Apr 22 '13 at 19:23

score 14 · Accepted Answer · answered Apr 22 '13 at 18:30

They are different packages with different purposes. One is not a substitute for the other, despite there being a small subset of functionality for which they overlap.

Here is the brief summary of each package, from the packages themselves:

The plyr package is a set of clean and consistent tools that implement the split-apply-combine pattern in R. This is an extremely common pattern in data analysis: you solve a complex problem by breaking it down into small pieces, doing something to each piece and then combining the results back together again.

and

data.table ... offers fast subset, fast grouping, fast update, fast ordered joins and list columns in a short and flexible syntax, for faster development. It is inspired by A[B] syntax in R where A is a matrix and B is a 2-column matrix.

Where they overlap is in the "fast grouping" which plyr also does by splitting data.frames, operating on pieces, and recombining them into a single data.frame. data.table has many other features which make operations on data.frame like structures fast; plyr has features which apply the split-apply-combine paradigm to other data structures such as lists and arrays (both as inputs and outputs).

So, really, they are two different tools that happen to have a small area of overlap which address the same problem domain, but each does much more than that and if you want/need that additional functionality, then that package should be used.

sounds like you're saying that `plyr` does some things that `data.table` can't - that's exactly what I'm looking for - can you please give an example or two? thanks — eddi, Apr 22 '13 at 18:37
`library("plyr"); example("llply")` Or really, any of the `**ply` functions other than `ddply`. — Brian Diggs, Apr 22 '13 at 18:42
`llply` doesn't seem like a good one for this purpose (as far I see it does very little on top of what `lapply` already does), but the other ones do, I'll take a look at those functions and maybe resurrect this question after that, for now this'll do, thanks — eddi, Apr 22 '13 at 19:09

when is plyr better than data.table?

1 Answers1

Linked