0

Had trouble figuring out the best way to phrase this in the title, but the broader issue here is I'm trying to combine two non-overlapping columns (split by gender) in a dataset into a third, gender-neutral column with values for each row/participant... and then do that for i times.

Here's an example. My dataset is ELSH2, and the first set of columns will be HTM1, HTW1, and HT1. I figured out pretty quickly how to combine columns just once:

ELSH2$HT1 <- ifelse(is.na(ELSH2$HTM1), ELSH2$HTW1, ELSH2$HTM1)

So all the values from the HTW1 and HTM1 columns are now combined in the HT1 column. But essentially what I want is:

ELSH2$HTi <- ifelse(is.na(ELSH2$HTMi), ELSH2$HTWi, ELSH2$HTMi)

where i is each sequential number in the range 1-k, k being the largest number at the end of column names matching the above strings (i.e., there are k columns that start with HTM or HTW; HTM and HTW will always have the same k value). In this example, k=5, but I'm going to do this with multiple cases (i.e., other strings to match in place of HTM/HTW) involving different values of k.

I tried using grepl:

ELSH2[,grepl("HT.", names(ELSH2))] <- ifelse(
    is.na(ELSH[,grepl("HTM.", names(ELSH2))]), 
    ELSH2[,grepl("HTW.", names(ELSH2))], 
    ELSH2[,grepl("HTM.", names(ELSH2))])

But I'm getting the following error:

Warning message:
In `[<-.data.frame`(`*tmp*`, , grepl("HTM.", names(ELSH2)), value = list( :
  provided 5300 variables to replace 10 variables

I'm pretty sure there's something wrong with the way I'm trying to make the HT columns here, but even if I create them manually, I get the same sort of error.

EDIT: Here's a sample dataset.

HTM1<- rnorm(10)
HTW1<- rnorm(10)
HTM2<- rnorm(10)
HTW2<- rnorm(10)
HTM3<- rnorm(10)
HTW3<- rnorm(10)
HTM4<- rnorm(10)
HTW4<- rnorm(10)
HTM5<- rnorm(10)
HTW5<- rnorm(10)

HTM <- data.frame(HTM1,HTM2,HTM3,HTM4,HTM5)
HTW <- data.frame(HTW1,HTW2,HTW3,HTW4,HTW5)
HTM[1, ] <- NA
HTM[3, ] <- NA
HTM[5, ] <- NA
HTM[7, ] <- NA
HTM[9, ] <- NA

HTW[2, ] <- NA
HTW[4, ] <- NA
HTW[6, ] <- NA
HTW[8, ] <- NA
HTW[10, ] <- NA

ELSH2 <- cbind(HTW, HTM)

ELSH2 looks like this: Original

And I want the final HT columns to look like this poorly photoshopped monstrosity: Desired result

Just interleaving the columns where they have missing values.

geedlet
  • 143
  • 1
  • 8
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jan 17 '20 at 20:49
  • @MrFlick Done! Hopefully that's helpful? – geedlet Jan 17 '20 at 21:20

1 Answers1

0

On possibility is just to treat this like a reshaping problem. Here we use dplyr and tidyr to make that easier

library(dplyr)
library(tidyr)
ELSH2 %>% 
  mutate(row=row_number()) %>% 
  pivot_longer(HTW1:HTM5) %>% 
  filter(!is.na(value)) %>% 
  extract(name, into=c("prefix","code"), "^([A-Za-z]+)(\\d+)$") %>% 
  mutate(name=paste0("HT", code)) %>% 
  pivot_wider(row, names_from=name, values_from=value)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • thanks for the suggestion, but it told me that pivot_longer wasn't a thing, so I tried to install the tidyverse package, and now I'm down a rabbit hole ending in this message: Error: package or namespace load failed for ‘tidyverse’: .onAttach failed in attachNamespace() for 'tidyverse', details: call: get(Info[i, 1], envir = env) error: lazy-load database 'C:/Users/User/Documents/Other Stuff/Miscellaneous/R/rstudioapi/R/rstudioapi.rdb' is corrupt – geedlet Jan 17 '20 at 22:10
  • Ok, I think I resolved that, and when I ran your suggestion, I got: Error: row_number() should only be called in a data context Run `rlang::last_error()` to see where the error occurred. > rlang::last_error() row_number() should only be called in a data context Backtrace: 1. plyr::mutate(., row = row_number()) 8. [ base::eval(...) ] with 1 more call 9. dplyr::row_number() 11. dplyr:::from_context("..group_size") 12. `%||%`(...) 13. plyr::mutate(., row = row_number()) Run `rlang::last_trace()` to see the full context. – geedlet Jan 17 '20 at 22:19