35

I like the plyr syntax. Any time I have to use one of the *apply() commands I end up kicking the dog and going on a 3 day bender. So for the sake of my dog and my liver, what's concise syntax for doing a ddply operation on every row of a data frame?

Here's an example that works well for a simple case:

x <- rnorm(10)
y <- rnorm(10)
df <- data.frame(x,y)
ddply(df,names(df) ,function(df) max(df$x,df$y))

that works fine and gives me what I want. But if things get more complex this causes plyr to get funky (and not like Bootsy Collins) because plyr is chewing on making "levels" out of all those floating point values

x <- rnorm(1000)
y <- rnorm(1000)
z <- rnorm(1000)
myLetters <- sample(letters, 1000, replace=T)
df <- data.frame(x,y, z, myLetters)
ddply(df,names(df) ,function(df) max(df$x,df$y))

on my box this chews for a few minutes and then returns:

Error: memory exhausted (limit reached?)
In addition: Warning messages:
1: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :
  Reached total allocation of 1535Mb: see help(memory.size)

I think I am totally abusing plyr and I am not saying this is a bug in plyr, but rather abusive behavior by me (liver and dog notwithstanding).

So in short, is there syntax shortcut for using ddply to operate on every row as a substitute for apply(X, 1, ...)?

The workaround I've been using is to create a "key" that gives a unique value for every row and then I can join back to it.

 x <- rnorm(1000)
 y <- rnorm(1000)
 z <- rnorm(1000)
 myLetters <- sample(letters, 1000, replace=T)
 df <- data.frame(x,y, z, myLetters)
  #make the key
 df$myKey <- 1:nrow(df)
 myOut <- merge(df, ddply(df,"myKey" ,function(df) max(df$x,df$y)))
  #knock out the key
 myOut$myKey <- NULL

But I keep thinking that "There Has to Be a Better Way"

Thanks!

JD Long
  • 59,675
  • 58
  • 202
  • 294
  • Just a thought, but does taking a transpose `t(df)` of the dataframe work for you? – Bob Albright Jan 15 '10 at 21:07
  • it "works" in that it returns the transpose. But I don't seen an angle of how that gets me toward a solution. But remember, I'm not very smart (I'm an economist), so you may have to spell it out for me. – JD Long Jan 15 '10 at 21:11
  • 2
    You can skip the merge step with `ddply(df,"myKey", transform, max = max(x, y))` – hadley Jan 15 '10 at 21:16
  • i clearly don't grok transform. Not even a little. – JD Long Jan 15 '10 at 21:32
  • 1
    Is there a reason you can't just do `pmax(df$x, df$y)`? – Jonathan Chang Jan 15 '10 at 22:06
  • 2
    Jonathan, for this simple example there's probably a number of ways I could do this without plyr. I always try to do really simple examples for my questions. My actual application is much more complex, but if I can do the simple example here I can abstract it to the more complex. Thanks for the recommendation, though. – JD Long Jan 18 '10 at 16:15
  • lol - I end up kicking the dog and going on a 3 day bender – Nicholas Hamilton May 03 '15 at 11:56

1 Answers1

44

Just treat it like an array and work on each row:

adply(df, 1, transform, max = max(x, y))
hadley
  • 102,019
  • 32
  • 183
  • 245
  • 2
    I feel like I'm having a teachable moment... but I'm not grasping what is before me. Can you explain the roll of the transform function? I read the docs for both adply and transform and I'm not grasping how this combination works. Is transform the function that adply is doing on each margin? and max is called after the transform? I'm boggled. – JD Long Jan 15 '10 at 21:31
  • 13
    `transform` takes n + 1 arguments. The first argument is the data frame you want to transform - that's what plyr passes in for you. The n arguments are expressions giving the new columns that you want. – hadley Jan 15 '10 at 23:16
  • 1
    hey that's amazingly useful! I just wrapped my brain around it. Thank you for taking the time to spoon feed me. In my simple mind I thought that the transform functionality was built into the **ply functions. Now I see exactly how to use transform. I'll use it in working code today! – JD Long Jan 18 '10 at 16:18
  • 4
    See also summarize, arrange, mutate, nulcolwise etc, instead of transform. Very very useful. – Alex Brown Aug 04 '11 at 06:11
  • 1
    Is there a way to drop the original columns and just return the new column (max in this case)? – savagent Jul 23 '14 at 02:33