Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
200
votes
6 answers

How to interpret dplyr message `summarise()` regrouping output by 'x' (override with `.groups` argument)?

I started getting a new message (see post title) when running group_by and summarise() after updating to dplyr development version 0.8.99.9003. Here is an example to recreate the output: library(tidyverse) library(hablar) df <- read_csv("year, week,…
Susie Derkins
  • 2,506
  • 2
  • 13
  • 21
104
votes
15 answers

How to get summary statistics by group

I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate(). data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,…
user1289220
  • 1,041
  • 2
  • 8
  • 3
58
votes
3 answers

What is the pandas equivalent of dplyr summarize/aggregate by multiple functions?

I'm having issues transitioning to pandas from R where dplyr package can easily group-by and perform multiple summarizations. Please help improve my existing Python pandas code for multiple aggregations: import pandas as pd data = pd.DataFrame( …
B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178
44
votes
3 answers

R - dplyr Summarize and Retain Other Columns

I am grouping data and then summarizing it, but would also like to retain another column. I do not need to do any evaluations of that column's content as it will always be the same as the group_by column. I can add it to the group_by statement but…
atclaus
  • 1,046
  • 1
  • 9
  • 12
35
votes
4 answers

Define and apply custom bins on a dataframe

Using python I have created following data frame which contains similarity values: cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000…
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232
18
votes
4 answers

Pass column names as strings to group_by and summarize

With dplyr starting version 0.7 the methods ending with underscore such as summarize_ group_by_ are deprecated since we are supposed to use quosures. See: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html I am trying to…
witek
  • 984
  • 1
  • 8
  • 25
11
votes
3 answers

Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because they conflict after grouping. Label <-…
mckisa
  • 155
  • 2
  • 2
  • 7
10
votes
2 answers

How to use dplyr to calculate a weighted mean of two grouped variables

I know this must be super easy, but I'm having trouble finding the right dplyr commands to do this. Let's say I want to group a dataset by two variables, and then summarize the count for each row. For this we simply have: mtcars %>% group_by(cyl,…
ds_guy
  • 143
  • 2
  • 5
8
votes
1 answer

Using R & dplyr to summarize - group_by, count, mean, sd

I am fairly new to R and even newer to dplyr. I have a small data set comprised of 2 columns - var1 and var2. The var1 column is comprised of num values. The var2 column is comprised of factors with 3 levels - A, B, and C. var1 var2 1 …
earlev4
  • 83
  • 1
  • 5
7
votes
1 answer

tidyverse: count number of a specific level when summarizing

I would like, when summarizing after grouping, to count the number of a specific level of another factor. In the working example below, I would like to count the number of "male" levels in each group. I've tried many things with count, tally and so…
Dominique Makowski
  • 1,511
  • 1
  • 13
  • 30
6
votes
3 answers

How to use R dplyr's summarize to count the number of rows that match a criteria?

I have a dataset that I want to summarize. First, I want the sum of the home and away games, which I can do. However, I also want to know how many outliers (defined as more than 300 points) are within each subcategory (home, away). If I wasn't using…
J.Sabree
  • 2,280
  • 19
  • 48
6
votes
3 answers

tidyverse summarize multiple columns but show result as rows

I have data where I want to get a bunch of summary statistics for multiple columns with the tidyverse approach. However, utilizing tidyverse's summarize function, it will create each column statistic as a new column, whereas I would prefer to see…
deschen
  • 10,012
  • 3
  • 27
  • 50
6
votes
2 answers

r summarize_if with multiple conditions

I'm trying to reduce a df of observations to a single observation (single line). I would like to summarize_if is numeric with the mean and if is string or factor with the mode. The code below doesn't work, but I hope it gives the idea. Thanks! #data…
fiodeno
  • 77
  • 8
6
votes
2 answers

How to use "summarise" from dplyr with dynamic column names?

I am summarizing group means from a table using the summarize function from the dplyr package in R. I would like to do this dynamically, using a column name string stored in another variable. The following is the "normal" way and it works, of…
Vance
  • 127
  • 6
6
votes
3 answers

Pandas: Get per-year counts for Dateranges spanning multiple years

I have a dataframe with records spanning multiple years: WarName | StartDate | EndDate --------------------------------------------- 'fakewar1' 01-01-1990 02-02-1995 'examplewar' 05-01-1990 03-07-1998 (...) …
Jasper
  • 2,131
  • 6
  • 29
  • 61
1
2 3
55 56