I recently tried to match adjacent identical rows in a dataframe based on two variables (Condition1 and Outcome1 below). I have seen people doing this with all rows but not with adjacent rows, which is why I developed the following three-step work-around (which I hope did not overthink things):
-I lagged the variables based on which I wanted the matching to be done.
-I compared the variables and lagged-variables
-I deleted all rows in which both ware identical (and removed the remaining unnecessary columns).
Case <- c("Case 1", "Case 2", "Case 3", "Case 4", "Case 5")
Condition1 <- c(0, 1, 0, 0, 1)
Outcome1 <- c(0, 0, 0, 0, 1)
mwa.df <- data.frame(Case, Condition1, Outcome1)
new.df <- mwa.df
Condition_lag <- c(new.df$Condition1[-1],0)
Outcome_lag <- c(new.df$Outcome1[-1],0)
new.df <- cbind(new.df, Condition_lag, Outcome_lag)
new.df$Comp <- 0
new.df$Comp[new.df$Outcome1 == new.df$Outcome_lag & new.df$Condition1 == new.df$Condition_lag] <- 1
new.df <- subset(new.df, Comp == 0)
new.df <- subset(new.df, select = -c(Condition_lag, Outcome_lag, Comp))
This worked just fine. But when I tried to create a function for this because I had to do this operation with a large number of data frames, I encountered the problem that the lag did not work (i.e. the condition_lag <- c(new.df$condition[-1],0)
and outcome_lag <- c(new.df$outcome[-1],0)
operations were not carried out). The function code was:
FLC.Dframe <- function(old.df, condition, outcome){
new.df <- old.df
condition_lag <- c(new.df$condition[-1],0)
outcome_lag <- c(new.df$outcome[-1],0)
new.df <- cbind(new.df, condition_lag, outcome_lag)
new.df$comp <- 0
new.df$comp[new.df$outcome == new.df$outcome_lag & new.df$condition == new.df$condition_lag] <- 1
new.df <- subset(new.df, comp == 0)
new.df <- subset(new.df, select = -c(condition_lag, outcome_lag, comp))
return(new.df)
}
As for using the function, I wrote new.df <- FLC.Dframe(mwa.df, Condition1, Outcome1)
.
Could someone help me with this? Many thanks in advance.