0

I have the following data.frame:

> tail(contacts.byChannel.weekly, 20)

    WEEK_START   WEEK_END COMM_TYPE_CODE  CONTACT_CHANNEL TOTAL_CONTACTS TOTAL_HMD TOTAL_HMD_NO        NRR
 1: 2015-05-03 2015-05-09          PHONE    PHONE - OTHER            326       104           14 0.13461538
 2: 2015-05-03 2015-05-09          PHONE PHONE - OTHER_DD            313        89            8 0.08988764
 3: 2015-05-10 2015-05-16           CHAT             CHAT            576       132           20 0.15151515
 4: 2015-05-10 2015-05-16          EMAIL            EMAIL            933       124           37 0.29838710
 5: 2015-05-10 2015-05-16          PHONE      PHONE - C2C            203        50           12 0.24000000
 6: 2015-05-10 2015-05-16          PHONE   PHONE - GOOGLE            197        48            3 0.06250000
 7: 2015-05-10 2015-05-16          PHONE    PHONE - OTHER            487       166           25 0.15060241
 8: 2015-05-10 2015-05-16          PHONE PHONE - OTHER_DD            334        90           12 0.13333333
 9: 2015-05-17 2015-05-23           CHAT             CHAT            568       107           17 0.15887850
10: 2015-05-17 2015-05-23          EMAIL            EMAIL           1023       141           39 0.27659574
11: 2015-05-17 2015-05-23          PHONE      PHONE - C2C            156        44            5 0.11363636
12: 2015-05-17 2015-05-23          PHONE   PHONE - GOOGLE            224        46            7 0.15217391
13: 2015-05-17 2015-05-23          PHONE    PHONE - OTHER            553       165           11 0.06666667
14: 2015-05-17 2015-05-23          PHONE PHONE - OTHER_DD            386       108           11 0.10185185
15: 2015-05-24 2015-05-30           CHAT             CHAT             25         2            1 0.50000000
16: 2015-05-24 2015-05-30          EMAIL            EMAIL             33         3            2 0.66666667
17: 2015-05-24 2015-05-30          PHONE      PHONE - C2C              8         0            0        NaN
18: 2015-05-24 2015-05-30          PHONE   PHONE - GOOGLE              6         2            0 0.00000000
19: 2015-05-24 2015-05-30          PHONE    PHONE - OTHER             10         2            1 0.50000000
20: 2015-05-24 2015-05-30          PHONE PHONE - OTHER_DD             11         1            0 0.00000000

How can I use the ddply & transform functions to turn the above into:

WEEK_START    WEEK_END    PHONE.TOTAL_CONTACTS  CHAT.TOTAL_CONTACTS  EMAIL.TOTAL_CONTACTS     
2015-05-03    2015-05-09  sum(total_contacts)   sum(total_contacts)  sum(total_contacts)
2015-05-10    2015-05-17  "                     "                    "
2015-05-24    2015-05-30  "                     "                    "

, where columns [, 3:5] are the COMM_TYPE_CODE and the values are summed, according to week start & end?

Here's sample data:

set.seed(1234)
foo <- data.frame(
  WEEK_START = as.Date(c("2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24")),
  WEEK_END = as.Date(c("2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30")),
  COMM_TYPE_CODE = c(rep("CHAT", 4), rep("EMAIL", 4), rep("PHONE", 4)),
  TOTAL_CONTACTS = rbinom(12, 10000, .1))

Thanks!!

Ray
  • 3,137
  • 8
  • 32
  • 59

3 Answers3

2

Try

library(reshape2)
dcast(foo, WEEK_START+WEEK_END~COMM_TYPE_CODE, value.var='TOTAL_CONTACTS' , sum)

For multiple value columns, the devel version of data.table i.e. v1.9.5 can be used.

 set.seed(24)
 foo$TOTAL_HMD <- sample(900:1200, 12, replace=FALSE)

library(data.table)#v1.9.5+
dcast(setDT(foo), WEEK_START+WEEK_END~COMM_TYPE_CODE, 
           value.var=c('TOTAL_CONTACTS', 'TOTAL_HMD'), sum)
#     WEEK_START   WEEK_END CHAT_sum_TOTAL_CONTACTS EMAIL_sum_TOTAL_CONTACTS
#1: 2015-05-03 2015-05-09                     971                     2033
#2: 2015-05-10 2015-05-16                    1013                     2027
#3: 2015-05-17 2015-05-23                    1014                     1975
#4: 2015-05-24 2015-05-30                     987                     1984
 #   CHAT_sum_TOTAL_HMD EMAIL_sum_TOTAL_HMD
#1:                988                2230
#2:                967                2146
#3:               1110                2058
#4:               1054                2131

Or

 library(reshape2)
 recast(foo, id.var=1:3, ...~COMM_TYPE_CODE+variable, value.var='value', sum)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This helps!! Is it possible to sum across 2 different columns or do I have the call the function twice? E.g. both 'TOTAL_CONTACTS' and 'TOTAL_HMD' – Ray May 28 '15 at 20:46
  • @Ray Your example dataset was with one value column. So, I created a new column and tested it. The update using the devel version of data.table will get you the expected result – akrun May 28 '15 at 20:50
  • Hey Akrun - I actually figured it out by combining `melt` and `dcast`. Check out my answer below and lemme know if that makes more sense. – Ray May 28 '15 at 21:15
  • When I tried passing a vector into `value.var` like above, I got the following error `Error in .subset2(x, i, exact = exact) : subscript out of bounds In addition: Warning message: In if (!(value.var %in% names(data))) { : the condition has length > 1 and only the first element will be used` – Ray May 28 '15 at 21:21
  • @Ray I am not sure how you got the error message.I am using the devel version of data.table, which has this feature in dcast for taking multiple value columns. For the `melt/dcast` a convenient wrapper would be `recast`. – akrun May 29 '15 at 04:37
2

OK, so after digging around, I arrived found: Cast multiple value columns

Applying that:

set.seed(1234)
foo <- data.frame(
  WEEK_START = as.Date(c("2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24")),
  WEEK_END = as.Date(c("2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30")),
  COMM_TYPE_CODE = c(rep("CHAT", 4), rep("EMAIL", 4), rep("PHONE", 4)),
  TOTAL_CONTACTS = rbinom(12, 10000, .1),
  TOTAL_HMD = sample(900:1200, 12, replace=FALSE))

library(reshape2)
melt.foo <- melt(foo, id.vars = 1:3)  # Note first 4 columns
pivot.foo <- dcast(melt.contacts.byChannel.weekly, WEEK_START+WEEK_END ~ COMM_TYPE_CODE + variable, fun.aggregate = sum)
Community
  • 1
  • 1
Ray
  • 3,137
  • 8
  • 32
  • 59
  • 1
    Thanks for linking that post.. I've updated with [data.table's new dcast function's feature to cast multiple columns directly](http://stackoverflow.com/a/30517531/559784) as @akrun has showed in his answer. – Arun May 28 '15 at 21:39
  • @Arun: I do not think this work request actually needs two row-designators, but future users may certainly benefit it the data is more complex. – IRTFM May 29 '15 at 00:04
  • @BondedDust, OP has asked about multiple value columns in comments under Akrun's answer.. – Arun May 29 '15 at 07:10
  • Thanks for clarifying. What's holding back release of ver 1.9.5 to CRAN? – IRTFM May 29 '15 at 15:32
0

Noting that there is a 1-1 relationship between week start- and end-dates, you don't really need to have two row-designators. Why not just:

with (foo,  tapply(TOTAL_CONTACTS,
              INDEX=   list( WeekStart_End= paste( WEEK_START, WEEK_END, sep=" - "),
                             Sum_CONTACT_CHANNEL=COMM_TYPE_CODE),
                 FUN=sum) )

#----------------
                         Sum_CONTACT_CHANNEL
WeekStart_End             CHAT EMAIL PHONE
  2015-05-03 - 2015-05-09  971  1025  1008
  2015-05-10 - 2015-05-16 1013  1025  1002
  2015-05-17 - 2015-05-23 1014   967  1008
  2015-05-24 - 2015-05-30  987   973  1011

And noting the format of the row-numbers, that's probably not a dataframe but rather a 'data.table,' although data.tables do inherit from data.frames.

IRTFM
  • 258,963
  • 21
  • 364
  • 487