Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of dplyr 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions

200

votes

6 answers

How to interpret dplyr message `summarise()` regrouping output by 'x' (override with `.groups` argument)?

I started getting a new message (see post title) when running group_by and summarise() after updating to dplyr development version 0.8.99.9003. Here is an example to recreate the output: library(tidyverse) library(hablar) df <- read_csv("year, week,…

r dplyr summarize

asked Jun 01 '20 at 20:26

Susie Derkins

2,506
2
13
21

104

votes

15 answers

How to get summary statistics by group

I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate(). data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,…

r dplyr stat summarize r-faq

asked Mar 23 '12 at 22:04

user1289220

1,041
2
8
3

votes

3 answers

What is the pandas equivalent of dplyr summarize/aggregate by multiple functions?

I'm having issues transitioning to pandas from R where dplyr package can easily group-by and perform multiple summarizations. Please help improve my existing Python pandas code for multiple aggregations: import pandas as pd data = pd.DataFrame( …

python r pandas pandas-groupby summarize

asked Aug 13 '16 at 18:03

B.Mr.W.

18,910
35
114
178

votes

3 answers

R - dplyr Summarize and Retain Other Columns

I am grouping data and then summarizing it, but would also like to retain another column. I do not need to do any evaluations of that column's content as it will always be the same as the group_by column. I can add it to the group_by statement but…

r dplyr summarize

asked Aug 23 '16 at 03:58

atclaus

1,046
1
9
12

votes

4 answers

Define and apply custom bins on a dataframe

Using python I have created following data frame which contains similarity values: cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000…

r dataframe binning summarize

asked Aug 15 '12 at 02:50

add-semi-colons

18,094
55
145
232

votes

4 answers

Pass column names as strings to group_by and summarize

With dplyr starting version 0.7 the methods ending with underscore such as summarize_ group_by_ are deprecated since we are supposed to use quosures. See: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html I am trying to…

r dplyr summarize rlang quosure

asked Oct 24 '17 at 19:18

witek

votes

3 answers

Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because they conflict after grouping. Label <-…

r group-by tidyverse dplyr summarize

asked Oct 03 '17 at 21:13

mckisa

votes

2 answers

How to use dplyr to calculate a weighted mean of two grouped variables

I know this must be super easy, but I'm having trouble finding the right dplyr commands to do this. Let's say I want to group a dataset by two variables, and then summarize the count for each row. For this we simply have: mtcars %>% group_by(cyl,…

r dplyr weighted-average summarize split-apply-combine

asked Apr 24 '18 at 01:15

ds_guy

votes

1 answer

Using R & dplyr to summarize - group_by, count, mean, sd

I am fairly new to R and even newer to dplyr. I have a small data set comprised of 2 columns - var1 and var2. The var1 column is comprised of num values. The var2 column is comprised of factors with 3 levels - A, B, and C. var1 var2 1 …

r dplyr summarize

asked Jul 25 '19 at 04:18

earlev4

votes

1 answer

tidyverse: count number of a specific level when summarizing

I would like, when summarizing after grouping, to count the number of a specific level of another factor. In the working example below, I would like to count the number of "male" levels in each group. I've tried many things with count, tally and so…

r group-by dplyr tidyverse summarize

asked Mar 22 '17 at 14:47

Dominique Makowski

1,511
1
13
30

votes

3 answers

How to use R dplyr's summarize to count the number of rows that match a criteria?

I have a dataset that I want to summarize. First, I want the sum of the home and away games, which I can do. However, I also want to know how many outliers (defined as more than 300 points) are within each subcategory (home, away). If I wasn't using…

r dplyr subset counting summarize

asked Apr 19 '22 at 12:20

J.Sabree

2,280
19
48

votes

3 answers

tidyverse summarize multiple columns but show result as rows

I have data where I want to get a bunch of summary statistics for multiple columns with the tidyverse approach. However, utilizing tidyverse's summarize function, it will create each column statistic as a new column, whereas I would prefer to see…

r dplyr tidyr summarize

asked May 27 '20 at 11:40

deschen

10,012
3
27
50

votes

2 answers

r summarize_if with multiple conditions

I'm trying to reduce a df of observations to a single observation (single line). I would like to summarize_if is numeric with the mean and if is string or factor with the mode. The code below doesn't work, but I hope it gives the idea. Thanks! #data…

r dplyr mode reduction summarize

asked May 06 '20 at 15:07

fiodeno

votes

2 answers

How to use "summarise" from dplyr with dynamic column names?

I am summarizing group means from a table using the summarize function from the dplyr package in R. I would like to do this dynamically, using a column name string stored in another variable. The following is the "normal" way and it works, of…

r dplyr summarize

asked Jan 30 '20 at 13:25

Vance

votes

3 answers

Pandas: Get per-year counts for Dateranges spanning multiple years

I have a dataframe with records spanning multiple years: WarName | StartDate | EndDate --------------------------------------------- 'fakewar1' 01-01-1990 02-02-1995 'examplewar' 05-01-1990 03-07-1998 (...) …

python pandas date-arithmetic summarize

asked May 20 '18 at 23:09

Jasper

2,131
6
29
61

2 3

…

55 56 Next