Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

Split a data structure (data frame, list, array) into smaller pieces;
Apply a function to each piece; then
Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by dplyr.

Repositories

Other resources

The Split-Apply-Combine Strategy for Data Analysis by Hadley Wickham in the Journal of Statistical Software
Data visualisation in R with ggplot2 and plyr course
Tutorial from useR2009 conference
manipulatr Google Group
Posts on R-bloggers

Related tags

r's dplyr and data.table packages

2465 questions

162

votes

6 answers

How to select the rows with maximum values in each group with dplyr?

I would like to select a row with maximum value in each group with dplyr. Firstly I generate some random data to show my question set.seed(1) df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5)) df$value <- runif(nrow(df)) In plyr, I could use a…

r dplyr plyr greatest-n-per-group

asked Jun 16 '14 at 06:00

Bangyou

9,462
16
62
94

147

votes

6 answers

Reshape three column data frame to matrix ("long" to "wide" format)

I have a data.frame that looks like this. x a 1 x b 2 x c 3 y a 3 y b 3 y c 2 I want this in matrix form so I can feed it to heatmap to make a plot. The result should look something like: a b c x 1 2 3 y 3 3 2 I…

r matrix dataframe plyr reshape

asked Mar 08 '12 at 12:03

MalteseUnderdog

1,971
5
17
17

144

votes

8 answers

Applying a function to every row of a table using dplyr?

When working with plyr I often found it useful to use adply for scalar functions that I have to apply to each and every row. e.g. data(iris) library(plyr) head( adply(iris, 1, transform , Max.Len= max(Sepal.Length,Petal.Length)) ) …

r plyr dplyr

asked Feb 16 '14 at 23:21

Stephen Henderson

6,340
3
27
33

134

votes

5 answers

Count number of rows by group using dplyr

I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me library(plyr) ddply(mtcars,…

r dplyr count plyr

asked Mar 31 '14 at 17:11

charmee

1,501
2
9
9

118

votes

4 answers

dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output

When using summarise with plyr's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE. However, this doesn't work when using summarise with dplyr. Is there another way to keep empty categories…

r dplyr plyr tidyr

asked Mar 20 '14 at 03:52

eipi10

91,525
24
209
285

100

votes

6 answers

dplyr: "Error in n(): function should not be called directly"

I am attempting to reproduce one of the examples in the dplyr package but am getting this error message. I am expecting to see a new column n produced with the frequency of each combination. What am I missing? I triple checked that the package is…

r function plyr dplyr conflicting-libraries

asked Apr 02 '14 at 03:44

Michael Bellhouse

1,547
3
14
26

votes

3 answers

What does the dot mean in R – personal preference, naming convention or more?

I am (probably) NOT referring to the "all other variables" meaning like var1~. here. I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this: function (.data, .fun = NULL, ...,…

r coding-style naming-conventions plyr

asked Sep 23 '11 at 08:51

Matt Bannert

27,631
38
141
207

votes

5 answers

How to create a lag variable within each group?

I have a data.table: require(data.table) set.seed(1) data <- data.table(time = c(1:3, 1:4), groups = c(rep(c("b", "a"), c(3, 4))), value = rnorm(7)) data # groups time value # 1: b 1…

r data.table plyr dplyr

asked Oct 10 '14 at 04:33

xiaodai

14,889
18
76
140

votes

5 answers

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I have a data frame that looks like this: #df ID DRUG FED AUC0t Tmax Cmax 1 1 0 100 5 20 2 1 1 200 6 25 3 0 1 NA 2 30 4 0 0 150 6 65 Ans so on. I want to summarize some…

r plyr dplyr shadowing name-collision

asked Nov 14 '14 at 06:00

Amer

2,131
3
23
38

votes

1 answer

Why is plyr so slow?

I think I am using plyr incorrectly. Could someone please tell me if this is 'efficient' plyr code? require(plyr) plyr <- function(dd) ddply(dd, .(price), summarise, ss=sum(volume)) A little context: I have a few large aggregation problems and I…

r dataframe plyr data.table

asked Jul 18 '12 at 02:17

ricardo

8,195
7
47
69

votes

8 answers

Aggregate a dataframe on a given column and display another column

I have a dataframe in R of the following form: > head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f I would like to aggregate it following the Score column…

r aggregate plyr greatest-n-per-group

asked Jun 09 '11 at 07:37

jul635

votes

6 answers

Convert data from long format to wide format with multiple measure columns

I am having trouble figuring out the most elegant and flexible way to switch data from long format to wide format when I have more than one measure variable I want to bring along. For example, here's a simple data frame in long format. ID is the…

r dataframe plyr

asked May 14 '12 at 18:33

colonel.triq

votes

3 answers

R: Is there a good replacement for plyr::rbind.fill in dplyr?

for tidyverse users, dplyr is the new way to work with data. For users trying to avoid older package plyr, what is the equivalent function to rbind.fill in dplyr?

r dplyr plyr

asked Jun 09 '17 at 18:20

userJT

11,486
20
77
88

votes

5 answers

Object not found error with ddply inside a function

This has really challenged my ability to debug R code. I want to use ddply() to apply the same functions to different columns that are sequentially named; eg. a, b, c. To do this I intend to repeatedly pass the column name as a string and use the…

r function scope plyr

asked Aug 05 '11 at 10:50

Look Left

1,305
3
15
20

votes

6 answers

R: speeding up "group by" operations

I have a simulation that has a huge aggregate and combine step right in the middle. I prototyped this process using plyr's ddply() function which works great for a huge percentage of my needs. But I need this aggregation step to be faster since I…

performance r plyr

asked Sep 10 '10 at 14:39

JD Long

59,675
58
202
294

2 3

…

99 100 Next