Groupby one column then sum up another value column and count rows then calculate percentage for each in R

Question

Given a small dataset as follows:

df <- structure(list(id = 1:8, type = structure(c(1L, 1L, 1L, 2L, 2L, 
3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), values = c(360000L, 
331715L, 260000L, 164900L, NA, 120000L, 331238L, 629861L)), class = "data.frame", row.names = c(NA, 
-8L))

How could I groupby type and sum up values and count numbers of entries, then calculate value_percent and number_percent for each type?

The expected result will like this:

Thanks for your help at advance.

Update:

value_percent become all 1s if Chinese characters inside dataset for @Karthik S's solution.

df <- structure(list(物业类型 = structure(c(1L, 3L, 2L, 1L, 3L, 
4L, 3L, 3L, 4L, 4L, 4L, 3L), .Label = c("商业/零售", "数据中心", 
"写字楼", "综合体"), class = "factor"), 成交总价.万元. = c(360000L, 
331715L, 260000L, 164900L, NA, 120000L, 331238L, 629861L, 68800L, 
47600L, 804600L, 450000L)), class = "data.frame", row.names = c(NA, 
-12L))

Code:

df %>% 
  group_by(物业类型) %>% 
  dplyr::summarise(总额占比 = sum(成交总价.万元., na.rm = T)/sum(成交总价.万元., na.rm = T), 笔数占比 = n()/nrow(df))

Out:

score 2 · Accepted Answer · answered Oct 16 '20 at 11:57

2

You can do :

library(dplyr)

df %>%
  group_by(type) %>%
  summarise(value_percent = sum(values, na.rm = TRUE),
            count_percent = n()) %>%
  mutate(across(ends_with('percent'), prop.table))

#  type  value_percent count_percent
#  <fct>         <dbl>         <dbl>
#1 a            0.433          0.375
#2 b            0.0750         0.25 
#3 c            0.492          0.375

answered Oct 16 '20 at 11:57

Ronak Shah

377,200
20
156
213

It works after I use `detach(package:plyr)`. Many thanks. – ah bon Oct 16 '20 at 12:06
I test with new data, `prop.table` function seems not working when the values are `int`? – ah bon Feb 03 '21 at 09:20
I don't think so `prop.table(1L:10L)` works fine. – Ronak Shah Feb 03 '21 at 09:23
I get it, I need to pass column names to `prop.table`, with `df %>% dplyr::group_by(type) %>% dplyr::summarise(count = n()) %>% arrange(desc(count)) %>% mutate(per = prop.table(count))`, it works. – ah bon Feb 03 '21 at 10:36
But it seems not equivalent to @Karthik S's solution, which is `df %>% dplyr::group_by(type) %>% dplyr::summarise(per = n()/nrow(df)) %>% arrange(desc(per))`. – ah bon Feb 03 '21 at 10:43

score 2 · Answer 2 · answered Oct 16 '20 at 12:01

2

Does this work:

> df %>% group_by(type) %>% summarise(value_percent = sum(values,na.rm = T)/sum(df$values, na.rm = T), count_percent = n()/nrow(df))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
  type  value_percent count_percent
  <fct>         <dbl>         <dbl>
1 a            0.433          0.375
2 b            0.0750         0.25 
3 c            0.492          0.375
>

answered Oct 16 '20 at 12:01

Karthik S

11,348
2
11
25

I tested your method with my real data, all the `value_percent` become `1`, I don't know why. It works for the data in the question, maybe because of `NaN`s? – ah bon Oct 16 '20 at 12:15
1

No if there's a NaN, na.rm will work on that. For ex: > sum(c(1,2,NaN)) [1] NaN > sum(c(1,2,NaN), na.rm = T) [1] 3 > – Karthik S Oct 16 '20 at 12:24
It seems this only happens if data has chinese characters inside, I'll update the data and my code for you to test. – ah bon Oct 16 '20 at 12:27

Groupby one column then sum up another value column and count rows then calculate percentage for each in R

2 Answers2