This is a follow up to this question: Duplicating observations of a dataframe, but also replacing specific variable values in R
I have tried to write as succinctly as possible, whilst giving all necessary information. In this current example, I have a df that looks like this:
df<-data.frame(alpha=c(1, "3, 4", "2, 4, 5", 2, 1, 3, "1, 2", "1, 2, 3"),
beta=c("2, 4", "3, 4", 1, 3, 3, "1, 4", "1, 2", "1, 2, 3"),
color=c("red", "yellow"))
# alpha beta color
#1 1 2, 4 red
#2 3, 4 3, 4 yellow
#3 2, 4, 5 1 red
#4 2 3 yellow
#5 1 3 red
#6 3 1, 4 yellow
#7 1, 2 1, 2 red
#8 1, 2, 3 1, 2, 3 yellow
Desired output
I want it to end up looking something like this (the order of the rows isn't that important to me):
# alpha beta color value
#1 1 2 red 1
#2 1 4 red 1
#3 3 4 yellow 0.5
#4 2 1 red 1
#5 4 1 red 1
#6 5 1 red 1
#7 2 3 yellow 1
#8 1 3 red 1
#9 3 1 yellow 1
#10 3 4 yellow 1
#11 1 2 red 0.5
#12 1 2 yellow 0.5
#13 1 3 yellow 0.5
#14 2 3 yellow 0.5
Whenever there is an observation of alpha or beta with multiple non-overlapping/non-matching entries, then that observation is expanded. e.g. Line 1 of the original df has alpha="1" and beta="2, 4". These become two separate rows in the output with the "2" and "4" of beta separated. The information kept in the variable 'color' is kept for each observation. Additionally, a new variable 'value' gets a 1.
The key difference between this and the previous question is the presence of rows where alpha==beta .
Here, I only want to keep unique combinations of alpha and beta, and not duplicates. e.g. row 8 of the original df has alpha="1, 2, 3" and beta="1, 2, 3". I want to have separate rows where "alpha=1, beta=2", "alpha=1,beta=3", "alpha=2,beta=3". Again, the 'color' variable will be duplicated. However, here, value needs to be '0.5'.
What I have tried:
I couldn't figure out how to do this all in one, so I first subset the df according to whether alpha is equal to beta or not.
x <- df[df$alpha!=df$beta,]
df$alpha<-as.character(df$alpha)
df$beta<-as.character(df$beta)
fun1 <- function(df){
df$alpha<-as.character(df$alpha)
df$beta<-as.character(df$beta)
do.call(rbind, with(df, Map(expand.grid,
alpha = strsplit(alpha, ", "),
beta = strsplit(beta, ", "),
color = color,
value = 1
)))
}
fun1(x)
which gives:
alpha beta color value
#1 1 2 red 1
#2 1 4 red 1
#3 2 1 red 1
#4 4 1 red 1
#5 5 1 red 1
#6 2 3 yellow 1
#7 1 3 red 1
#8 3 1 yellow 1
#9 3 4 yellow 1
Then the observations where alpha==beta , which is where I'm getting stuck ...
x2 <- df[df$alpha==df$beta,]
x2
# alpha beta color
#2 3, 4 3, 4 yellow
#7 1, 2 1, 2 red
#8 1, 2, 3 1, 2, 3 yellow
My idea was to split the data using strsplit
and then to use combn
to find the combinations I'm looking for and to bind back together. However, this is not going to work like this if I want to duplicate other variables...
a<-strsplit(x2$alpha, ", ")
a.combs <- lapply(a, function(x) c(combn(x, 2, simplify=FALSE)))
matrix(unlist(a.combs),ncol=2, byrow=T)
gives:
# [,1] [,2]
#[1,] "3" "4"
#[2,] "1" "2"
#[3,] "1" "2"
#[4,] "1" "3"
#[5,] "2" "3"
Any ideas on how to do get these combinations along with the pre-existing 'color' variable and adding in the new 'value' variable are greatly appreciated.