0

Say we have a dataframe looking like this one below:

month     issue          amount
Jan       withdrawal     250
Jan       delay          120
Jan       other          65
Feb       withdrawal     189
Feb       delay          122
Feb       other          89

My goal is to tweak this in order to get a dataframe giving me the percentages of each value in issue related to each month. Plainly speaking, my desired output should look like follows:

month     issue          rate
Jan       withdrawal     57.47
Jan       delay          27.59
Jan       other          14.94
Feb       withdrawal     47.25
Feb       delay          30.50
Feb       other          22.25

I've tried helping myself with dplyr but my attempts had all been unsuccessful so far.

teogj
  • 289
  • 1
  • 11
  • 1
    Does this answer your question? [Summarizing by subgroup percentage in R](https://stackoverflow.com/questions/27134516/summarizing-by-subgroup-percentage-in-r) – Giora Simchoni Jan 26 '21 at 11:47

2 Answers2

1

Try the answer here . it might be what you're looking for. Code is pasted below for your convenience.

library(dplyr)
group_by(df, group) %>% mutate(percent = value/sum(value))

In your case it would probably be something like :

library(dplyr)
group_by(df, month) %>% mutate(rate= amount/sum(amount))

Or alternatively :

group_by(df, month) %>% transmute(issue, rate= amount/sum(amount))
thehand0
  • 1,123
  • 4
  • 14
1
library(tidyverse)
df <- read.table(text = "month     issue          amount
Jan       withdrawal     250
Jan       delay          120
Jan       other          65
Feb       withdrawal     189
Feb       delay          122
Feb       other          89", header = T)

df
#>   month      issue amount
#> 1   Jan withdrawal    250
#> 2   Jan      delay    120
#> 3   Jan      other     65
#> 4   Feb withdrawal    189
#> 5   Feb      delay    122
#> 6   Feb      other     89

df %>% 
  group_by(month) %>% 
  mutate(rate = amount / sum(amount, na.rm = T) * 100)
#> # A tibble: 6 x 4
#> # Groups:   month [2]
#>   month issue      amount  rate
#>   <chr> <chr>       <int> <dbl>
#> 1 Jan   withdrawal    250  57.5
#> 2 Jan   delay         120  27.6
#> 3 Jan   other          65  14.9
#> 4 Feb   withdrawal    189  47.2
#> 5 Feb   delay         122  30.5
#> 6 Feb   other          89  22.2

df %>% 
  group_by(month) %>% 
  mutate(rate = prop.table(amount) * 100)
#> # A tibble: 6 x 4
#> # Groups:   month [2]
#>   month issue      amount  rate
#>   <chr> <chr>       <int> <dbl>
#> 1 Jan   withdrawal    250  57.5
#> 2 Jan   delay         120  27.6
#> 3 Jan   other          65  14.9
#> 4 Feb   withdrawal    189  47.2
#> 5 Feb   delay         122  30.5
#> 6 Feb   other          89  22.2

Created on 2021-01-26 by the reprex package (v0.3.0)

using data.table

library(data.table)
setDT(df)
df[, rate := prop.table(amount) * 100, by = list(month)]
df
#>    month      issue amount     rate
#> 1:   Jan withdrawal    250 57.47126
#> 2:   Jan      delay    120 27.58621
#> 3:   Jan      other     65 14.94253
#> 4:   Feb withdrawal    189 47.25000
#> 5:   Feb      delay    122 30.50000
#> 6:   Feb      other     89 22.25000

Created on 2021-01-26 by the reprex package (v0.3.0)

Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14