0

Given a small dataset as follows:

df <- structure(list(id = 1:8, type = structure(c(1L, 1L, 1L, 2L, 2L, 
3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), values = c(360000L, 
331715L, 260000L, 164900L, NA, 120000L, 331238L, 629861L)), class = "data.frame", row.names = c(NA, 
-8L))

enter image description here

How could I groupby type and sum up values and count numbers of entries, then calculate value_percent and number_percent for each type?

The expected result will like this:

enter image description here

Thanks for your help at advance.

Update:

value_percent become all 1s if Chinese characters inside dataset for @Karthik S's solution.

df <- structure(list(物业类型 = structure(c(1L, 3L, 2L, 1L, 3L, 
4L, 3L, 3L, 4L, 4L, 4L, 3L), .Label = c("商业/零售", "数据中心", 
"写字楼", "综合体"), class = "factor"), 成交总价.万元. = c(360000L, 
331715L, 260000L, 164900L, NA, 120000L, 331238L, 629861L, 68800L, 
47600L, 804600L, 450000L)), class = "data.frame", row.names = c(NA, 
-12L))

Code:

df %>% 
  group_by(物业类型) %>% 
  dplyr::summarise(总额占比 = sum(成交总价.万元., na.rm = T)/sum(成交总价.万元., na.rm = T), 笔数占比 = n()/nrow(df))

Out:

enter image description here

ah bon
  • 9,293
  • 12
  • 65
  • 148

2 Answers2

2

You can do :

library(dplyr)

df %>%
  group_by(type) %>%
  summarise(value_percent = sum(values, na.rm = TRUE),
            count_percent = n()) %>%
  mutate(across(ends_with('percent'), prop.table))

#  type  value_percent count_percent
#  <fct>         <dbl>         <dbl>
#1 a            0.433          0.375
#2 b            0.0750         0.25 
#3 c            0.492          0.375
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • It works after I use `detach(package:plyr)`. Many thanks. – ah bon Oct 16 '20 at 12:06
  • I test with new data, `prop.table` function seems not working when the values are `int`? – ah bon Feb 03 '21 at 09:20
  • I don't think so `prop.table(1L:10L)` works fine. – Ronak Shah Feb 03 '21 at 09:23
  • I get it, I need to pass column names to `prop.table`, with `df %>% dplyr::group_by(type) %>% dplyr::summarise(count = n()) %>% arrange(desc(count)) %>% mutate(per = prop.table(count))`, it works. – ah bon Feb 03 '21 at 10:36
  • But it seems not equivalent to @Karthik S's solution, which is `df %>% dplyr::group_by(type) %>% dplyr::summarise(per = n()/nrow(df)) %>% arrange(desc(per))`. – ah bon Feb 03 '21 at 10:43
2

Does this work:

> df %>% group_by(type) %>% summarise(value_percent = sum(values,na.rm = T)/sum(df$values, na.rm = T), count_percent = n()/nrow(df))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
  type  value_percent count_percent
  <fct>         <dbl>         <dbl>
1 a            0.433          0.375
2 b            0.0750         0.25 
3 c            0.492          0.375
> 
Karthik S
  • 11,348
  • 2
  • 11
  • 25
  • I tested your method with my real data, all the `value_percent` become `1`, I don't know why. It works for the data in the question, maybe because of `NaN`s? – ah bon Oct 16 '20 at 12:15
  • 1
    No if there's a NaN, na.rm will work on that. For ex: > sum(c(1,2,NaN)) [1] NaN > sum(c(1,2,NaN), na.rm = T) [1] 3 > – Karthik S Oct 16 '20 at 12:24
  • It seems this only happens if data has chinese characters inside, I'll update the data and my code for you to test. – ah bon Oct 16 '20 at 12:27