1

I have a report that i need to do on a quarterly basis that involves adding various components of revenue together to formulate a trailing 12 month and trailing 24 month total.

rather than retyping a bunch of column names to add each column together on a rolling basis i was hoping to create a function where i could declare variables for the trailing months so i can sum them together easier.

my dataframe all_rel contains all the data i need to sum together. it contains the following fields (unfortunately i just inherited this report an it isn't exactly in tidy format)

Total_Processing_Revenue Ancillary_Revenue

in the data frame i have T24 months of these data points within separate columns

the script that someone put together that i inherited uses the following to add the columns together:

    all_rel$anci_rev_cy_ytd = all_rel$X201701Ancillary_Revenue+all_rel$X201702Ancillary_Revenue+all_rel$X201703Ancillary_Revenue+...+all_rel$X201712Ancillary_Revenue

i'm trying was hoping to do something with paste but can't seem to get it to work

    dfname <- 'all_rel$X'
    revmonth1 <- '01'
    revmonth2 <- '02'
    revmonth3 <- '03'
    revmonth4 <- '04'
    revmonth5 <- '05'
    revmonth6 <- '06'
    revmonth7 <- '07'
    revmonth8 <- '08'
    revmonth9 <- '09'
    revmonth10 <- '10'
    revmonth11 <- '11'
    revmonth12 <- '12'
    cy <- '2017'
    py <- '2016'
    rev1 <- 'Total_Processing_Revenue'
    rev2 <- 'Ancillary_Revenue'


    all_rel$anci_rev_py_ytd =  paste(dfname,py,revmonth1,rev2, sep ='')+paste(dfname,py,revmonth2,rev2, sep ='')+...paste(dfname,py,revmonth12,rev2, sep ='')

when i try to sum these fields together i get a "non-numeric argument to binary operator" error. Is there something else i can do instead of what i've been trying to do?

paste(rpt,py,revmonth1,rev2, sep ='') returns "all_rel$X201601Ancillary_Revenue"

is there a way that I can tell R that the reason why I'm pasting these names is to reference the data within them rather than the text I'm pasting?

i'm fairly new to R (i've been learning on the fly to try to make my life easier.

ultimately i need to figure out how to convert this mess to a tidy data format where each of the revenue columns has a month and year but i was hoping to use this issue to understand how to use substitution logic to better automate processes. Maybe i just worded my searches incorrectly but i was struggling to find the exact issue i'm trying to solve.

Any help is greatly appreciated.

::edit::

added dput(head)

    structure(list(Chain = c("000001", "000029", "000060", "000064","000076", "000079"), X201601Net_Revenue = c(-2.92, 25005.14,55787.59, 3996.69, 14229.41, 3455.85),X201601Total_Processing_Revenue = c(0,16140.48, 23238.89, 3574.17, 4093.51, 641.1), X201601Ancillary_Revenue = c(-2.92,8864.66, 32548.7, 422.52, 10135.9, 2814.75), X201602Net_Revenue = c(0,41918.84, 56696.34, 4789.57, 13113.2, 5211.27), X201602Total_Processing_Revenue = c(0,13253.19, 24733.04, 4395.69, 4102.79, 546.68), X201602Ancillary_Revenue = c(0,28665.65, 31963.3, 393.88, 9010.41, 4664.59), X201603Net_Revenue = c(0,23843.76, 62494.51, 5262.87, 20551.79, 7646.75), X201603Total_Processing_Revenue = c(0,15037.39, 27523.19,4792.63,4805.61,2134.72)),.Names=c("Chain","X201601Net_Revenue","X201601Total_Processing_Revenue","X201601Ancillary_Revenue","X201602Net_Revenue","X201602Total_Processing_Revenue","X201602Ancillary_Revenue","X201603Net_Revenue", "X201603Total_Processing_Revenue"), row.names = c(NA,6L), class = "data.frame")
MarkM
  • 13
  • 4
  • Using R this way - pasting code together to try to evaluate it - will make your life harder not easier. Share a little sample data and show your desired output and we can help you solve your problem in a nice way rather than helping you with this fragile, difficult to maintain hack. – Gregor Thomas Jan 30 '18 at 15:19
  • Use `dput()` to share data in a copy/pasteable way. [Also see this tutorial for creating reproducible examples in R](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Gregor Thomas Jan 30 '18 at 15:20
  • You can't sum characters in R... – s_baldur Jan 30 '18 at 15:21
  • the issue is the dataset is such a mess that i couldn't fit the full output. after i figure out how to introduce a bit more automation i'm going to work to transition the data into a more tidy format. i have added a sample of the data. – MarkM Jan 30 '18 at 17:21

1 Answers1

0

Here's how to tidy your data (calling your data dd):

    library(tidyr)
    library(dplyr)

    gather(dd, key = key, value = value, -Chain) %>%
        mutate(year = substr(key, start = 2, 5),
                     month = substr(key, 6, 7),
                     metric = substr(key, 8, nchar(key))) %>%
        select(-key) %>%
        spread(key = metric, value = value)

    #     Chain year month Ancillary_Revenue Net_Revenue Total_Processing_Revenue
    # 1  000001 2016    01             -2.92       -2.92                     0.00
    # 2  000001 2016    02              0.00        0.00                     0.00
    # 3  000001 2016    03                NA        0.00                     0.00
    # 4  000029 2016    01           8864.66    25005.14                 16140.48
    # 5  000029 2016    02          28665.65    41918.84                 13253.19
    # 6  000029 2016    03                NA    23843.76                 15037.39
    # 7  000060 2016    01          32548.70    55787.59                 23238.89
    # 8  000060 2016    02          31963.30    56696.34                 24733.04
    # 9  000060 2016    03                NA    62494.51                 27523.19
    # 10 000064 2016    01            422.52     3996.69                  3574.17
    # 11 000064 2016    02            393.88     4789.57                  4395.69
    # 12 000064 2016    03                NA     5262.87                  4792.63
    # 13 000076 2016    01          10135.90    14229.41                  4093.51
    # 14 000076 2016    02           9010.41    13113.20                  4102.79
    # 15 000076 2016    03                NA    20551.79                  4805.61
    # 16 000079 2016    01           2814.75     3455.85                   641.10
    # 17 000079 2016    02           4664.59     5211.27                   546.68
    # 18 000079 2016    03                NA     7646.75                  2134.72

With that done, you can use whatever grouped operations you want - sums, rolling sums or averages, etc. You might be interested in the yearmon class provided in the zoo package, this question on rolling sums by group, and of course the R-FAQ on grouped sums.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • well that probably solves all my issues. thanks for working with me. to make this a bit more of a reasonable process. i'll work to understand the functions you used a bit better – MarkM Jan 30 '18 at 18:24
  • i hope you don't mind if i ask a few more questions about your gather transform operation. This dataset is so messy that it actually has 174 columns of data. i'm trying to understand how you are using the arguements within your solution as i'm struggling to adapt them to my dataset. i have added an edit with the column names that weren't in my original dput – MarkM Jan 30 '18 at 21:56
  • [Read this for some background](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html), and then do the cleaning one step at a time, looking at what each step does. – Gregor Thomas Jan 30 '18 at 21:58
  • thanks. while i was trying to figure out how to word my edit i think i figured it out, though i'll have to test. – MarkM Jan 30 '18 at 22:29
  • This was exactly what i needed. i stepped through my entire script and made everything tidy. Thank you so much!!! – MarkM Jan 31 '18 at 15:46
  • Yeah - you seemed to want to work around the messy data for now and tidy later, but the workarounds would be much much messier than just tidying first. – Gregor Thomas Jan 31 '18 at 15:48