0

I am trying to use group_split() on a list of lists/dataframes.

Here is my code. df is what I have & final_df is what I am trying to get at. The names can be tossed. My goal is to get a single row for unique combinations of a & b.

df <- data.frame(a = c(rep(68,8),rep(70,8)), b = c((1:4),(1:4),(1:4),(1:4)),c = c(rep("Mike",4),rep("Joe",4),rep("Mike",4),rep("Joe",4)), d=c(70,71,75,79,72,69,66,90,70,77,74,72,72,69,66,90), e=c(30,32,44,42,22,23,24,21,30,37,41,42,21,22,24,20))

final_df <- data.frame(a=c(rep(68,4),rep(70,4)), b=c((1:4),(1:4)), d_1 = c(70,71,75,79,70,77,74,72), e_1 = c(30,32,44,42,30,37,41,42), d_2 = c(72,69,66,90,72,69,66,90), e_2 = c(22 ,23,24,21,21,22,24,20))


print(df)
a b    c  d  e
1  68 1 Mike 70 30
2  68 2 Mike 71 32
3  68 3 Mike 75 44
4  68 4 Mike 79 42
5  68 1  Joe 72 22
6  68 2  Joe 69 23
7  68 3  Joe 66 24
8  68 4  Joe 90 21
9  70 1 Mike 70 30
10 70 2 Mike 77 37
11 70 3 Mike 74 41
12 70 4 Mike 72 42
13 70 1  Joe 72 21
14 70 2  Joe 69 22
15 70 3  Joe 66 24
16 70 4  Joe 90 20

print(final_df)

   a b d_1 e_1 d_2 e_2
1 68 1  70  30  72  22
2 68 2  71  32  69  23
3 68 3  75  44  66  24
4 68 4  79  42  90  21
5 70 1  70  30  72  21
6 70 2  77  37  69  22
7 70 3  74  41  66  24
8 70 4  72  42  90  20

initially, I use lapply

list <- df %>% group_split(a)

then I think I need to do the same for b, but I cannot seem to get group_split to work again. I wrote a function to lapply the group_split within the list of lists

func <- function(y){lapply(y, y %>% group_split(b))}
list_2 <- lapply(list,function(x){lapply(x,func)})

but this does not work. I get this error

Error in UseMethod("group_split") : 
  no applicable method for 'group_split' applied to an object of class "c('double', 'numeric')"

I really appreciate any help. I could be going about this in a completely wrong & circuitous. Thanks again

1 Answers1

0

I had a go at this but ultimately found a solution provided by nghauran to a similar question by R noob here: Combine duplicate rows in dataframe and create new columns

Applied to your data:

df <- data.frame(a = c(rep(68,8),rep(70,8)), 
b = c((1:4),(1:4),(1:4),(1:4)),
c = c(rep("Mike",4),rep("Joe",4),rep("Mike",4),rep("Joe",4)),
d = c(70,71,75,79,72,69,66,90,70,77,74,72,72,69,66,90), 
e = c(30,32,44,42,22,23,24,21,30,37,41,42,21,22,24,20))

library(dplyr)
df <- df %>%
  group_by(a, b) %>%
  summarise_all(funs(paste((.), collapse = ",")))

library(splitstackshape)
df <- cSplit(df, c("c","d", "e"), ",")

df <- df[,c(1,2,5,7,6,8)] #This is not strictly necessary but indexes
# precisely your desired output

> df
    a b d_1 e_1 d_2 e_2
1: 68 1  70  30  72  22
2: 68 2  71  32  69  23
3: 68 3  75  44  66  24
4: 68 4  79  42  90  21
5: 70 1  70  30  72  21
6: 70 2  77  37  69  22
7: 70 3  74  41  66  24
8: 70 4  72  42  90  20
Roasty247
  • 679
  • 5
  • 20