0

I have 2 columns with survey answers:

subject_id | income | income unknown |
1          | 1000   | NA             |
2          | 15000  | NA             |
3          | NA     | 15000          |
4          | NA     | 10000          |
5          | 15000  | NA             |

I would like to merge the two columns. The second column has a value if the 1st income column was not answered (in most cases at least).

I thought of substituting NAs with zeros and then summing the two columns up, but that doesn't really work.

zx8754
  • 52,746
  • 12
  • 114
  • 209
PCUnique
  • 127
  • 1
  • 1
  • 8

2 Answers2

2

We can use coalesce

library(tidyverse)
df1 %>%
    transmute(subject_id, income = coalesce(income, income_unknown))
#   subject_id income
#1          1   1000
#2          2  15000
#3          3  15000
#4          4  10000
#5          5  15000

Other base R options include rowSums, pmax, pmin etc if there is only one non-NA element per row

df1$income <- rowSums(df1[-1], na.rm = TRUE)
df1$income <- do.call(pmax, c(df1[-1], na.rm = TRUE))

data

df1 <- structure(list(subject_id = 1:5, income = c(1000L, 15000L, NA, 
NA, 15000L), income_unknown = c(NA, NA, 15000L, 10000L, NA)),
.Names = c("subject_id", 
"income", "income_unknown"), class = "data.frame",
 row.names = c(NA, -5L))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

You can use ifelse()

df$income.new <- ifelse(is.na(df$income), df$income_unknown, df$income)
jogo
  • 12,469
  • 11
  • 37
  • 42