Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

  • Split a data structure (data frame, list, array) into smaller pieces;
  • Apply a function to each piece; then
  • Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by .

Repositories

Other resources

Related tags

2465 questions
162
votes
6 answers

How to select the rows with maximum values in each group with dplyr?

I would like to select a row with maximum value in each group with dplyr. Firstly I generate some random data to show my question set.seed(1) df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5)) df$value <- runif(nrow(df)) In plyr, I could use a…
Bangyou
  • 9,462
  • 16
  • 62
  • 94
147
votes
6 answers

Reshape three column data frame to matrix ("long" to "wide" format)

I have a data.frame that looks like this. x a 1 x b 2 x c 3 y a 3 y b 3 y c 2 I want this in matrix form so I can feed it to heatmap to make a plot. The result should look something like: a b c x 1 2 3 y 3 3 2 I…
MalteseUnderdog
  • 1,971
  • 5
  • 17
  • 17
144
votes
8 answers

Applying a function to every row of a table using dplyr?

When working with plyr I often found it useful to use adply for scalar functions that I have to apply to each and every row. e.g. data(iris) library(plyr) head( adply(iris, 1, transform , Max.Len= max(Sepal.Length,Petal.Length)) ) …
Stephen Henderson
  • 6,340
  • 3
  • 27
  • 33
134
votes
5 answers

Count number of rows by group using dplyr

I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me library(plyr) ddply(mtcars,…
charmee
  • 1,501
  • 2
  • 9
  • 9
118
votes
4 answers

dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output

When using summarise with plyr's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE. However, this doesn't work when using summarise with dplyr. Is there another way to keep empty categories…
eipi10
  • 91,525
  • 24
  • 209
  • 285
100
votes
6 answers

dplyr: "Error in n(): function should not be called directly"

I am attempting to reproduce one of the examples in the dplyr package but am getting this error message. I am expecting to see a new column n produced with the frequency of each combination. What am I missing? I triple checked that the package is…
Michael Bellhouse
  • 1,547
  • 3
  • 14
  • 26
93
votes
3 answers

What does the dot mean in R – personal preference, naming convention or more?

I am (probably) NOT referring to the "all other variables" meaning like var1~. here. I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this: function (.data, .fun = NULL, ...,…
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
89
votes
5 answers

How to create a lag variable within each group?

I have a data.table: require(data.table) set.seed(1) data <- data.table(time = c(1:3, 1:4), groups = c(rep(c("b", "a"), c(3, 4))), value = rnorm(7)) data # groups time value # 1: b 1…
xiaodai
  • 14,889
  • 18
  • 76
  • 140
75
votes
5 answers

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I have a data frame that looks like this: #df ID DRUG FED AUC0t Tmax Cmax 1 1 0 100 5 20 2 1 1 200 6 25 3 0 1 NA 2 30 4 0 0 150 6 65 Ans so on. I want to summarize some…
Amer
  • 2,131
  • 3
  • 23
  • 38
59
votes
1 answer

Why is plyr so slow?

I think I am using plyr incorrectly. Could someone please tell me if this is 'efficient' plyr code? require(plyr) plyr <- function(dd) ddply(dd, .(price), summarise, ss=sum(volume)) A little context: I have a few large aggregation problems and I…
ricardo
  • 8,195
  • 7
  • 47
  • 69
56
votes
8 answers

Aggregate a dataframe on a given column and display another column

I have a dataframe in R of the following form: > head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f I would like to aggregate it following the Score column…
jul635
  • 794
  • 1
  • 7
  • 13
49
votes
6 answers

Convert data from long format to wide format with multiple measure columns

I am having trouble figuring out the most elegant and flexible way to switch data from long format to wide format when I have more than one measure variable I want to bring along. For example, here's a simple data frame in long format. ID is the…
colonel.triq
  • 593
  • 1
  • 4
  • 6
41
votes
3 answers

R: Is there a good replacement for plyr::rbind.fill in dplyr?

for tidyverse users, dplyr is the new way to work with data. For users trying to avoid older package plyr, what is the equivalent function to rbind.fill in dplyr?
userJT
  • 11,486
  • 20
  • 77
  • 88
40
votes
5 answers

Object not found error with ddply inside a function

This has really challenged my ability to debug R code. I want to use ddply() to apply the same functions to different columns that are sequentially named; eg. a, b, c. To do this I intend to repeatedly pass the column name as a string and use the…
Look Left
  • 1,305
  • 3
  • 15
  • 20
39
votes
6 answers

R: speeding up "group by" operations

I have a simulation that has a huge aggregate and combine step right in the middle. I prototyped this process using plyr's ddply() function which works great for a huge percentage of my needs. But I need this aggregation step to be faster since I…
JD Long
  • 59,675
  • 58
  • 202
  • 294
1
2 3
99 100