1

I have a df (partially listed below).

    account    company           sum
1    202003         B01        40.50
2    208001         B01    -71596.73
3    280250         B01     -6652.70
4    200001         B13     44362.77
5    202001         B13     13874.25
6    204001         B13      5744.20
7    204102         B13       295.00
8    285091         B13    317145.15
9    286101         B13     11471.13
10   298001         B13    396096.50

I am trying to add the numbers in the df1$sum column into a new dataframe based on the following criteria: For each company (B01 and B13) I need to aggregate the column df1$sum for all accounts in df1$accounts beginning with ^20 and call it df2$Expenses1 (so in this example that would be 202003 and 208001 for company B01).

Separately I need to aggregate all of the accounts beginning with ^28 and ^29 and call it df2$Expenses2(so for company B01 it would only be account 280250 and for company B13 it would be the sum of accounts 285091, 286101 and298001).

Ultimately, the new data frame df2 should look something like this:

Company   Expenses1   Expenses2
 B01     -71,556.23    -6,652.7
 B13      64,276.22  724,712.78    

Would really appreciate some help with this!

A. Suliman
  • 12,923
  • 5
  • 24
  • 37
A.N.
  • 580
  • 4
  • 15
  • What have you tried? We have a nice [R-FAQ on "How to sum by group?"](https://stackoverflow.com/q/1660124/903061). The only difference for you is you need to take a subset first - subset the account numbers starting with "20", then sum by group; subset the account numbers starting with "28" or "29", then sum by group. – Gregor Thomas Oct 29 '18 at 14:04
  • If you need help with the "account numbers starting with X" part, I would recommend using the `substr` function to pull the first two digits and test on that. You could also use a regex pattern with `grepl`, something like `"^2[89]"` – Gregor Thomas Oct 29 '18 at 14:07
  • 1
    `aggregate(sum ~ company + grepl("^20", account), data=df1, FUN=sum)` – jogo Oct 29 '18 at 14:09

1 Answers1

4

With tidyverse

library(tidyverse)
df %>% 
  group_by(company) %>% 
  summarise(Expenses1 = sum(sum*grepl('^20', account)),
            Expenses2 = sum(sum*grepl('^28|^29', account)))
# # A tibble: 2 x 3
#   company Expenses1 Expenses2
#   <chr>       <dbl>     <dbl>
# 1 B01        -71556   -  6653
# 2 B13         64276    724713

With data table

library(data.table)
setDT(df)

df[, lapply(c(Expenses1 = '^20', 
              Expenses2 = '^28|^29'), 
            function(patt) sum(sum*grepl(patt, account)))
   , by = company]

#    company Expenses1 Expenses2
# 1:     B01 -71556.23   -6652.7
# 2:     B13  64276.22  724712.8

If you really only have two groups, you can create another grouping variable then use dcast or spread e.g.

df[, .(Expenses = sum(sum))
   , by = .(company, 
            acct_type = paste0('Expenses_', ifelse(grepl('^20', account), '20', 'other')))] %>% 
  dcast(company ~ acct_type)


#    company Expenses_20 Expenses_other
# 1:     B01   -71556.23        -6652.7
# 2:     B13    64276.22       724712.8
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • I think the first 2 versions will work better as I might have to add more entries to the data frame and group them differently. Nonetheless, thanks for your help – A.N. Oct 29 '18 at 15:55