0

I am working with a panel data set so the same question has been asked several times over the span of a few years. I want to create a new variable with the values of the oldest wave. If there are NAs in this oldest wave I want to overwrite ONLY these with the "newer" values of the second wave. And if there are missing values in the first and second wave I want to overwrite only those with the values of the third wave. Therefore, I am working with mutate and case_when and combining conditions in case_when. However, either I can't overwrite the old values with newer ones or I produce almost only missing values bc the condition is almost never true. In that second case I would like to know how to firstly take the values from the first wave and ONLY replace them in the case of NAs. I tried the following variants:

1.

gles_panel2 <- gles_panel2 %>%
mutate(test3 = case_when(kp7_180 %in% c(2) ~ "1",
                       kp7_180 %in% c(1, 6) ~ "0",
                       kp7_180 < 1 ~ "NA",
                       kp7_180 %in% NA & kp10_2780 == 2 ~ "1",
                       kp7_180 %in% NA & kp10_2780 == 1 ~ "0",
                       kp7_180 %in% NA & kp10_2780 < 1 ~ "NA"))
2. 
gles_panel2 <- gles_panel2 %>%
mutate(test3 = case_when(kp7_180 %in% c(2) ~ "1",
                       kp7_180 %in% c(1, 6) ~ "0",
                       kp7_180 < 1 ~ "NA"))
gles_panel2 <- gles_panel2 %>%
mutate(test3 = case_when(test3 %in% NA & kp10_2780 == 2 ~ "1",
                       test3 %in% NA & kp10_2780 == 1 ~ "0",
                       test3 %in% NA & kp10_2780 < 1 ~ "NA"))

I achieved it with the nested ifelse-function (see below) but I want to get the same result with case_when.

gles_panel2 <- gles_panel2 %>%
  mutate(nwahl2013_test = ifelse(kp7_180==2,1,ifelse(kp7_180 %in% c(1,6),0, NA)))
gles_panel2 <- gles_panel2 %>%
mutate(nwahl2013_test = ifelse(is.na(nwahl2013_test) & kp10_2780==2,1,ifelse(is.na(nwahl2013_test) & kp10_2780==1,0, nwahl2013_test)))
  • 2
    Welcome to SO! To help us to help would you mind making your issue reproducible by sharing a snippet of your data? See [how to make a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). To share your data, you could type `dput(NAME_OF_DATASET)` into the console and copy & paste the output starting with `structure(....` into your post. If your dataset has a lot of observations you could do e.g. `dput(head(NAME_OF_DATASET, 20))` for the first twenty rows of data. – stefan Nov 24 '21 at 09:52

1 Answers1

0

I solved the problem the following way:

gles_panel2 <- gles_panel %>%
mutate(nwahl2013 = case_when(kp7_180 %in% c(2) ~ 1,
                       kp7_180 %in% c(1, 6) ~ 0,
                       ))
gles_panel2 <- gles_panel2 %>%
mutate(nwahl2013 = case_when(is.na(nwahl2013) & kp10_2780 %in% c(2) ~ 1,
                       is.na(nwahl2013) & kp10_2780 %in% c(1, 6) ~ 0,
                       !is.na(nwahl2013) ~ nwahl2013))
gles_panel2 <- gles_panel2 %>%
mutate(nwahl2013 = case_when(is.na(nwahl2013) & kp17_2780 %in% c(2) ~ 1,
                       is.na(nwahl2013) & kp17_2780 %in% c(1, 6) ~ 0,
                       !is.na(nwahl2013) ~ nwahl2013))

I knew that the main problem was to tell R that the values that didn't meet the conditions on the left hand side in the first two rows of "case_when" should be taken from the "nwahl2013" variable. That is a (one possible) solution if anyone else should face the same problem :-)