3

So, I have a process that I currently use Excel for, and I want to find the most efficient way to do it in R.

My data take this shape:

ID <- c(rep(1, 3), rep(2, 3))
Source <- rep(c("A", "A", "B"), 2)
Total <- c(11, 13, 12, 25, 27, 26)
Actions <- c(3, 2, 3, 8, 9, 10)
df <- data.frame(ID, Source, Total, Actions)
df 
#   ID Source Total Actions
# 1  1      A    11       3
# 2  1      A    13       2
# 3  1      B    12       3
# 4  2      A    25       8
# 5  2      A    27       9
# 6  2      B    26      10

I run an aggregate on Total and Actions:

df2 <- aggregate(cbind(Total, Actions) ~ ID + Source, 
          data = df, FUN=sum)
df2
#   ID Source Total Actions
# 1  1      A    24       5
# 2  2      A    52      17
# 3  1      B    12       3
# 4  2      B    26      10

But what I really want is a situation where the totals for A and totals for B are separate columns in the same table. Currently I am doing it like this:

df2_A <- df2[(df2$Source == "A"), ]
df2_B <- df2[(df2$Source == "B"), ]
x <- merge(df2_A, df2_B, by.x = "ID", by.y = "ID")
x 
#   ID Source.x Total.x Actions.x Source.y Total.y Actions.y
# 1  1        A      24         5        B      12         3
# 2  2        A      52        17        B      26        10   

My question is, is there a more elegant way to go from df to x in one step? I feel like what I am doing now is a crude hack, and I want to expand my R knowledge.

mmyoung77
  • 1,343
  • 3
  • 14
  • 22
  • Related: [Reshape multiple values at once](https://stackoverflow.com/questions/27247078/reshape-multiple-values-at-once). "Just" add `fun = sum` to the `dcast` steps. – Henrik May 02 '18 at 21:33
  • Thanks, Henrik, that thread is teaching me a lot also. – mmyoung77 May 03 '18 at 14:46

3 Answers3

4

You may want to simplify the whole operation into a single pipe

library(dplyr)
library(tidyr)
df %>% group_by(ID, Source) %>% 
  summarize_all(sum) %>% ungroup()%>%
  gather(key, value, -c(ID, Source)) %>% 
  unite(newkey, key, Source) %>% 
  spread(newkey, value)
#> # A tibble: 2 x 5
#>      ID Actions_A Actions_B Total_A Total_B
#> * <dbl>     <dbl>     <dbl>   <dbl>   <dbl>
#> 1     1         5         3      24      12
#> 2     2        17        10      52      26
dmi3kno
  • 2,943
  • 17
  • 31
  • This is probably going to end up being the answer, but I'm curious if anyone sends anything else. Thanks! – mmyoung77 May 02 '18 at 21:06
  • There's also a `data.table` solution (which allows casting multiple columns at once without `unite` step), but I'll let other people answer. – dmi3kno May 02 '18 at 21:08
1

A reshape2 version:

library(reshape2)

> dcast(melt(df, id.vars = ID), ID ~ Source + variable, fun.aggregate = sum)
  ID A_Total A_Actions B_Total B_Actions
1  1      24         5      12         3
2  2      52        17      26        10

And a kind of questionable base version, that gets you pretty close to the desired output, including appropriate source columns:

do.call(rbind, lapply(split(df, f = ID), function(x) {
  y <- (split(x, Source))
  ID = x[[1]][1]
  cbind(ID, do.call(cbind, lapply(y, function(z) { 
    w <- data.frame(Source = z[1,2])
    q <- data.frame(t(colSums(z[,c("Total", "Actions")])))
    data.frame(w,q)
  })))
}))

  ID A.Source A.Total A.Actions B.Source B.Total B.Actions
1  1        A      24         5        B      12         3
2  2        A      52        17        B      26        10
Luke C
  • 10,081
  • 1
  • 14
  • 21
1

OP is pretty close to solution. He has to just take another aggregate of over the df2 and he would have got the answer.

In short aggregate of aggregate is an option to find solution as:

aggregate(cbind(Source,Total,Actions)~ID, 
  data=aggregate(cbind(Total, Actions) ~ ID + Source, data = df, FUN=sum), I)
#   ID Source.1 Source.2 Total.1 Total.2 Actions.1 Actions.2
# 1  1        1        2      24      12         5         3
# 2  2        1        2      52      26        17        10
MKR
  • 19,739
  • 4
  • 23
  • 33
  • @mmyoung77 Thanks for your comments. I appreciate your efforts to put details in the question. BTW, it seems you have forgotten to mark/accept any of the available solutions. – MKR May 03 '18 at 15:13