0

I have variables with the names "VA01_01", "VA01_02" etc. and "VA02_01", "VA02_02". Those variables with the prefix VA01 are data from female participants, those with the prefix VA02 are from male participants. Male participants, for example, have NAs in the variables VA01. I already have a factor with values for sex.

What I'd like to do is create a new set of variables that take over the values from both variable types. That is, if it's a male participant, he gets the values of the VA02 variables in that set of variables. So the new set of variables won't have any NAs any more because it won't be based on sex.

Does anyone have a simple solution for that question? I don't know if reshape is the answer because I don't really want to transform my data frame into long format.

Here how it looks like at the beginning:

 structure(list(sex = structure(c(1L, 2L, 1L, 2L), .Label = c("female", 
 "male"), class = "factor"), VA01_01 = c(1, NA, 2, NA), VA01_02 = c(4, 
 NA, 4, NA), VA02_01 = c(NA, 3, NA, 4), VA02_02 = c(NA, 5, NA, 
 3)), .Names = c("sex", "VA01_01", "VA01_02", "VA02_01", "VA02_02"
 ), row.names = c(NA, -4L), class = "data.frame")

And here at the end (I'd like to keep the original variables):

structure(list(sex = structure(c(1L, 2L, 1L, 2L), .Label = c("female", 
"male"), class = "factor"), VA_tot_01 = c(1, 3, 2, 4), VA_tot_02 = c(4, 
5, 4, 3), VA01_01 = c(1, NA, 2, NA), VA01_02 = c(4, NA, 4, NA
), VA02_01 = c(NA, 3, NA, 4), VA02_02 = c(NA, 5, NA, 3)), .Names = c("sex", 
"VA_tot_01", "VA_tot_02", "VA01_01", "VA01_02", "VA02_01", "VA02_02"
), row.names = c(NA, -4L), class = "data.frame")
  • 1
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. The answer likely involves reshaping to tall, summarizing, then reshaping back to wide. – MrFlick May 21 '18 at 15:58

1 Answers1

0

Considering the VAR01s and VAR02s don't overlap, you could simply create another variables VAR_tot_xx including the original values from both. It would be something like this:

new_vars <- function(df) {
  vars <- unique(gsub(
    pattern = ".*_", 
    replacement = "_", 
    x = grep(
      pattern = "_[0-9]{2}$", 
      x = names(df), 
      value = TRUE
    )
  ))
  for (i in vars) {
    new_name <- paste0("VA_tot", i)
    male_name <- paste0("VA01", i)
    female_name <- paste0("VA02", i)
    df[[new_name]] <- NA
    df[[new_name]][!is.na(df[[female_name]])] <- 
      df[[female_name]][!is.na(df[[female_name]])]
    df[[new_name]][!is.na(df[[male_name]])] <- 
      df[[male_name]][!is.na(df[[male_name]])]
  }
  return(df)
}

It could probably get prettier than this, but this does the job.

c <- structure(
  list(
    sex = structure(
      c(1L, 2L, 1L, 2L),
      .Label = c("female", "male"),
      class = "factor"
    ),
    VA01_01 = c(1, NA, 2, NA),
    VA01_02 = c(4, NA, 4, NA),
    VA02_01 = c(NA, 3, NA, 4),
    VA02_02 = c(NA, 5, NA, 3)
  ),
  .Names = c("sex", "VA01_01", "VA01_02", "VA02_01", "VA02_02"),
  row.names = c(NA, -4L),
  class = "data.frame"
)
new_vars(c)

#       sex VA01_01 VA01_02 VA02_01 VA02_02 VA_tot_01 VA_tot_02
# 1 female        1       4      NA      NA         1         4
# 2   male       NA      NA       3       5         3         5
# 3 female        2       4      NA      NA         2         4
# 4   male       NA      NA       4       3         4         3
Edgar
  • 256
  • 2
  • 12