Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
903
votes
5 answers

data.table vs dplyr: can one do something well the other can't or does poorly?

Overview I'm relatively familiar with data.table, not so much with dplyr. I've read through some dplyr vignettes and examples that have popped up on SO, and so far my conclusions are that: data.table and dplyr are comparable in speed, except when…
BrodieG
  • 51,669
  • 9
  • 93
  • 146
304
votes
5 answers

Filter rows which contain a certain string

I have to filter a data frame using as criterion those row in which is contained the string RTB. I'm using dplyr. d.del <- df %>% group_by(TrackingPixel) %>% summarise(MonthDelivery = as.integer(sum(Revenue))) %>% …
Gianluca
  • 6,307
  • 19
  • 44
  • 65
285
votes
10 answers

Use dynamic name for new column/variable in `dplyr`

I want to use dplyr::mutate() to create multiple new columns in a data frame. The column names and their contents should be dynamically generated. Example data from iris: library(dplyr) iris <- as_tibble(iris) I've created a function to mutate my…
Timm S.
  • 5,135
  • 6
  • 24
  • 38
264
votes
7 answers

Display / print all rows of a tibble (tbl_df)

tibble (previously tbl_df) is a version of a data frame created by the dplyr data frame manipulation package in R. It prevents long table outputs when accidentally calling the data frame. Once a data frame has been wrapped by tibble/tbl_df, is there…
Zhe Zhang
  • 2,879
  • 2
  • 14
  • 12
264
votes
8 answers

Extract a dplyr tbl column as a vector

Is there a more succinct way to get one column of a dplyr tbl as a vector, from a tbl with database back-end (i.e. the data frame/table can't be subset directly)? require(dplyr) db <- src_sqlite(tempfile(), create = TRUE) iris2 <- copy_to(db,…
nacnudus
  • 6,328
  • 5
  • 33
  • 47
241
votes
10 answers

Relative frequencies / proportions with dplyr

Suppose I want to calculate the proportion of different values within each group. For example, using the mtcars data, how do I calculate the relative frequency of number of gears by am (automatic/manual) in one go with…
jenswirf
  • 7,087
  • 11
  • 45
  • 65
226
votes
5 answers

Can dplyr package be used for conditional mutating?

Can the mutate be used when the mutation is conditional (depending on the values of certain column values)? This example helps showing what I mean. structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4, 2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6,…
rdatasculptor
  • 8,112
  • 14
  • 56
  • 81
202
votes
6 answers

What does %>% function mean in R?

I have seen the use of %>% (percent greater than percent) function in some packages like dplyr and rvest. What does it mean? Is it a way to write closure blocks in R?
alfakini
  • 4,635
  • 2
  • 26
  • 35
200
votes
6 answers

How to interpret dplyr message `summarise()` regrouping output by 'x' (override with `.groups` argument)?

I started getting a new message (see post title) when running group_by and summarise() after updating to dplyr development version 0.8.99.9003. Here is an example to recreate the output: library(tidyverse) library(hablar) df <- read_csv("year, week,…
Susie Derkins
  • 2,506
  • 2
  • 13
  • 21
200
votes
10 answers

Fixing a multiple warning "unknown column"

I have a persistent multiple warning of "unknown column" for all types of commands (e.g., str(x) to installing updates on packages), and not sure how to debug this or fix it. The warning "unknown column" is clearly related to a variable in a tbl_df…
ssp3nc3r
  • 3,662
  • 2
  • 13
  • 23
199
votes
10 answers

Select first and last row from grouped data

Question Using dplyr, how do I select the top and bottom observations/rows of grouped data in one statement? Data & Example Given a data frame: df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), …
tospig
  • 7,762
  • 14
  • 40
  • 79
194
votes
6 answers

Remove duplicated rows using dplyr

I have a data.frame like this - set.seed(123) df = data.frame(x=sample(0:1,10,replace=T),y=sample(0:1,10,replace=T),z=1:10) > df x y z 1 0 1 1 2 1 0 2 3 0 1 3 4 1 1 4 5 1 0 5 6 0 1 6 7 1 0 7 8 1 0 8 9 1 0 9 10 0 1 10 I would…
Nishanth
  • 6,932
  • 5
  • 26
  • 38
189
votes
5 answers

Summarizing multiple columns with dplyr?

I'm struggling a bit with the dplyr-syntax. I have a data frame with different variables and one grouping variable. Now I want to calculate the mean for each column within each group, using dplyr in R. df <- data.frame( a = sample(1:5, n,…
Daniel
  • 7,252
  • 6
  • 26
  • 38
182
votes
10 answers

Group by multiple columns in dplyr, using string vector input

I'm trying to transfer my understanding of plyr into dplyr, but I can't figure out how to group by multiple columns. # make data with weird column names that can't be hard coded data = data.frame( asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3],…
sharoz
  • 6,157
  • 7
  • 31
  • 57
168
votes
9 answers

Sum across multiple columns with dplyr

My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using dplyr. The data entries in the columns are binary(0,1). I am thinking of a row-wise analog of the…
amo
  • 3,030
  • 4
  • 25
  • 42
1
2 3
99 100