0

I'm working with data whereby gender of participants has been noted in three columns ( baseline, first time point and second time point). For several hundred participants they have gender for baseline and second time point which hasn't changed, but NA at first time point.

I'm trying to derive correct code to change the NA in this column, if the same value (1=male, 2=female) is present in the columns before (baseline) and after it (second time point). My knowledge is basic as I'm learning R currently.

Please could anybody suggest some code that might work? I'm trying to figure out a for loop with if/else statements, but not quite getting there.

df$tp1 is first time point, df$base is baseline, df$tp2 is second time point

   for(i in df$tp1) {
       if (df$base == df$tp2) 
           df <- replace(df$tp1, df$tp1 =="NA", df$base)
       else (df$tp1 == df$tp1)}
   print(df)

Edit: Please find below an example of the data

     Baseline          TP1               TP2
        1               NA                1
        1               1                 1
        2               NA                2
        1               1                 1
        1               1                 1
        2               2                 2

In row 1 and row 3, I would like to be able to change the NA to the same number as baseline and second timepoint, i.e 1 and 2 respectively in these rows.

 dput(head(gender_only, 15))
 structure(list(Baseline = c(1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L, 2L, 2L, 2L), `First-timepoint` = c(NA, 1L, NA, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), `second-timepoint` =     c(1L,1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, 1L, 1L, 2L, 2L, NA)),         row.names = c(NA, 
  15L),     class = "data.frame")
  • Hi there and welcome to SO. Please make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) or [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) with a sample input and your expected output. [Please do not upload images of code/data/errors when asking a question.](//meta.stackoverflow.com/q/285551) Based on your input we can understand your problem and think about a possible solution and verify it compared to your expected output. – Martin Gal May 15 '22 at 21:11
  • As requested provide some data using `dput(head(df, 15))` and pasting the results into your code block. The code you provided will not work because R does not accept numbers as column names. You can get around it with constructions such as 'df$'11'`, but it is a lot of extra typing and liable to generate errors if you forget a quote. Provide code that you have actually used including the error messages. – dcarlson May 15 '22 at 21:54

1 Answers1

0

You can try using mutate and case_when from dplyr package. From the case_when documentation: "This function allows you to vectorise multiple if and else if statements."

#if you haven't installed it yet
#install.packages('dplyr')

df <- df %>%
      mutate(TP1 = case_when(Baseline == TP2 ~ TP2),
             TRUE ~ TP1)
uow
  • 120
  • 7