Had trouble figuring out the best way to phrase this in the title, but the broader issue here is I'm trying to combine two non-overlapping columns (split by gender) in a dataset into a third, gender-neutral column with values for each row/participant... and then do that for i times.
Here's an example. My dataset is ELSH2, and the first set of columns will be HTM1, HTW1, and HT1. I figured out pretty quickly how to combine columns just once:
ELSH2$HT1 <- ifelse(is.na(ELSH2$HTM1), ELSH2$HTW1, ELSH2$HTM1)
So all the values from the HTW1 and HTM1 columns are now combined in the HT1 column. But essentially what I want is:
ELSH2$HTi <- ifelse(is.na(ELSH2$HTMi), ELSH2$HTWi, ELSH2$HTMi)
where i is each sequential number in the range 1-k, k being the largest number at the end of column names matching the above strings (i.e., there are k columns that start with HTM or HTW; HTM and HTW will always have the same k value). In this example, k=5, but I'm going to do this with multiple cases (i.e., other strings to match in place of HTM/HTW) involving different values of k.
I tried using grepl
:
ELSH2[,grepl("HT.", names(ELSH2))] <- ifelse(
is.na(ELSH[,grepl("HTM.", names(ELSH2))]),
ELSH2[,grepl("HTW.", names(ELSH2))],
ELSH2[,grepl("HTM.", names(ELSH2))])
But I'm getting the following error:
Warning message:
In `[<-.data.frame`(`*tmp*`, , grepl("HTM.", names(ELSH2)), value = list( :
provided 5300 variables to replace 10 variables
I'm pretty sure there's something wrong with the way I'm trying to make the HT columns here, but even if I create them manually, I get the same sort of error.
EDIT: Here's a sample dataset.
HTM1<- rnorm(10)
HTW1<- rnorm(10)
HTM2<- rnorm(10)
HTW2<- rnorm(10)
HTM3<- rnorm(10)
HTW3<- rnorm(10)
HTM4<- rnorm(10)
HTW4<- rnorm(10)
HTM5<- rnorm(10)
HTW5<- rnorm(10)
HTM <- data.frame(HTM1,HTM2,HTM3,HTM4,HTM5)
HTW <- data.frame(HTW1,HTW2,HTW3,HTW4,HTW5)
HTM[1, ] <- NA
HTM[3, ] <- NA
HTM[5, ] <- NA
HTM[7, ] <- NA
HTM[9, ] <- NA
HTW[2, ] <- NA
HTW[4, ] <- NA
HTW[6, ] <- NA
HTW[8, ] <- NA
HTW[10, ] <- NA
ELSH2 <- cbind(HTW, HTM)
And I want the final HT columns to look like this poorly photoshopped monstrosity:
Just interleaving the columns where they have missing values.