How to merge and create subsets in a for loop (R)?

Question

I want to optimize this code in a for loop for a bigger set of data.

library(reshape2)
Customer<- c("Susan","Louis", "Frank","Susan")
Seller<- c("Ivan", "Donald","Chris","Ivan")
Service<-c("COU","CAR", "FCL","CAR")
Billingmean<- c(100,200,300,400)
WrsHoldSum<-c(0,0,0,0)
Group<- c("n1","n2"," "," ")
B1<- c(0,2,2,1) 
B2<-c(9,8,7,6)
B3<- c(5,4,3,2)
df<- data.frame(Customer, Seller,Service, Billingmean,WrsHoldSum, Group,B1,B2,B3)
sub1<- dcast(data= df, formula= Customer+Group+Seller+WrsHoldSum~Service,fun.aggregate= sum,value.var= "Billingmean")
sub2<- dcast(data= df, formula= Customer+Group+Seller+WrsHoldSum~Service,fun.aggregate= sum,value.var= "B1") 
sub3<- dcast(data= df, formula= Customer+Group+Seller+WrsHoldSum~Service,fun.aggregate= sum,value.var= "B2")
sub4<- dcast(data= df, formula= Customer+Group+Seller+WrsHoldSum~Service,fun.aggregate= sum,value.var= "B3")

finaldf<- merge (sub1,sub2, sub3, sub4,by=c("Customer","Group","Seller","WrsHoldSum"))

Maybe it is, but it's much better to have a short input mimicking your real issue, that we can actually load in R, and the output you'd like to get from it. — moodymudskipper, May 24 '18 at 19:17
Sorry I'm new in the community. Where do I upload my r code? — Diego Castillo, May 24 '18 at 19:25
Please read this page on how to make a great R example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1. Don't upload your code: paste the smallest amount needed to show what you want to do into a properly formatted code block in your question. — divibisan, May 24 '18 at 19:26
Don't worry about all 100 datasets and 305 columns: make a minimal example with 2 datasets and 3 columns (or something small like that). If we can get that to work the way you want it to, scaling it up will be trivial. — divibisan, May 24 '18 at 19:28
It's much better Diego, your desired output is not very clear however, do you want to get a data.frame with 4 rows and 19 columns ? — moodymudskipper, May 24 '18 at 20:04
in this case every new subset only adds 3 new columns. Those 3 new columns i want to merge them on a bigger database. Here I only made it for 4 records. I want to do it for longer sets of data. — Diego Castillo, May 24 '18 at 21:35
Hello @Moody_Mudskipper I need to convert the output from JdM below to a dataframe but I can't do it. I tried as.data.frame() funtion but it didn't work — Diego Castillo, May 26 '18 at 19:51

score 0 · Accepted Answer · edited Jun 20 '20 at 09:12

0

EDIT

Since you also want to know the origin of the columns that were not part of the by argument in merge, you can name those columns before Reduce() using lapply which will conveniently output a list and shorten your Reduce() statement:

Identify the name of the tables for lapply():

tNames <- grep(x = ls(), pattern = "^sub", value = T)

Then use lapply() with a custom function that will output the tables with the relevant columns' names already modified. Use the pipe %>% for convenience to Reduce() the list with merge():

lapply(seq_along(tNames), function(x){
  tSym <- as.name(tNames[[x]])
  
  d1 <- copy(eval(tSym))
  
  cols <- grep(x = names(d1), pattern = "^CAR|^COU|^FCL", value = T)
  
  setnames(d1, old = cols, new = paste0(cols, " B", x))
  
  return(d1)
  
}) %>% Reduce(function(x, y) merge(x, y, by = c("Customer","Group","Seller","WrsHoldSum")), .)

Result:

  Customer Group Seller WrsHoldSum CAR B1 COU B1 FCL B1 CAR B2 COU B2 FCL B2 CAR B3 COU B3 FCL B3 CAR B4 COU B4 FCL B4
1    Frank        Chris          0      0      0    300      0      0      2      0      0      7      0      0      3
2    Louis    n2 Donald          0    200      0      0      2      0      0      8      0      0      4      0      0
3    Susan         Ivan          0    400      0      0      1      0      0      6      0      0      2      0      0
4    Susan    n1   Ivan          0      0    100      0      0      0      0      0      9      0      0      5      0

edited Jun 20 '20 at 09:12

Community

1
1

answered May 24 '18 at 19:52

JdeMello

1,708
15
23

I allowed myself to edit your post following OP's changes in the question, I hope it's fine – moodymudskipper May 24 '18 at 20:07
I want an output just like this but I would like to identify which columns were merged from B1, B2, B3 etc. – Diego Castillo May 24 '18 at 21:55
I don't understand your question. Isn't B* summarized in the previous `dcast()` step? – JdeMello May 25 '18 at 13:43
Yeah that's right, but I want to see which columns of CAR, COU, FCL, belong to each dcast(). I don't want to see duplicates over the name columns. I want something like CAR B1, COU B1, FCL B1, CAR B2, COU B2, FCL B2... just to know from where the combination came from. – Diego Castillo May 25 '18 at 13:48
OK, you could've mentioned this in your question. You can try this: `library(magrittr); Reduce(function(x, y) merge(x, y, by = c("Customer","Group","Seller","WrsHoldSum")), list(sub1,sub2, sub3, sub4)) %>% setnames(., old = grep(x = names(.), pattern = "^CAR|^COU|^FCL", value = T), new = paste0(grep(x = names(.), pattern = "^CAR|^COU|^FCL", value = T), rep(paste0(".B", seq_along(list(sub1, sub2, sub3, sub4))), each = 3)))` – JdeMello May 25 '18 at 14:05
replace this part then `new = paste0(substr(grep(x = names(.), pattern = "^CAR|^COU|^FCL", value = T), 1, 3)` – JdeMello May 25 '18 at 14:15
it still shows me an error message. "column names ‘CAR.x’, ‘COU.x’, ‘FCL.x’, ‘CAR.y’, ‘COU.y’, ‘FCL.y’ " are duplicated in the resultHow could I optimize my code for a larger quantity of subs? – Diego Castillo May 25 '18 at 14:46
@DiegoCastillo No worries. Don't forget to upvote and check the answer as correct. Some ppl may have a similar question to you in the future. Cheers – JdeMello May 25 '18 at 15:49
Hey @JdM, now everything is running ok. Just a final request, i need to export this table into a csv. and view it as a dataframe in R. Now I'm only able to see it as a table format. I tried ´as.data.frame ()´ but it didn't work. – Diego Castillo May 26 '18 at 01:49
do you need to export into csv or import a csv table? – JdeMello May 27 '18 at 15:46

How to merge and create subsets in a for loop (R)?

1 Answers1

EDIT