1

I'm looking to replace the NA values in this example data frame with either 'A' or 'B' depending on their 'second' column category: (A for A1, B for B1)

df <- data.frame(first = c("A","A",NA,NA,"B",NA,NA,NA),second = c(rep("A1",4),rep("B1",4)))
df
  first second
1     A     A1
2     A     A1
3  <NA>     A1
4  <NA>     A1
5     B     B1
6  <NA>     B1
7  <NA>     B1
8  <NA>     B1

This is what I would like the resulting data frame to look like:

  first second
1     A     A1
2     A     A1
3     A     A1
4     A     A1
5     B     B1
6     B     B1
7     B     B1
8     B     B1

I tried this solution but obviously it didn't work:

df$first[is.na(df$first)] <- unique(df[!is.na(df$first),"first"])

I have a feeling there might be a dplyr solution but cannot think of it.

Thank you!

ESlice
  • 69
  • 4
  • 2
    `df$first[is.na(df$first)] = strsub(df$second[is.na(df$first)], 1, 1)` – tblznbits Oct 19 '17 at 21:33
  • 2
    I don't think this is an exact duplicate of question 23340150. The aim here is to replace NA based on the value of a second column, not the most recent non-NA of the same column. – neilfws Oct 19 '17 at 21:48

1 Answers1

1

No need for dplyr. This should work in base R:

df$first[is.na(df$first)] <- gsub("(\\w)\\d", "\\1", df$second[is.na(df$first)])

Explanation: Here, gsub replaces NA entries in first with entries from second, by matching [letter][digit] from second and replacing with [letter].

  first second
1     A     A1
2     A     A1
3     A     A1
4     A     A1
5     B     B1
6     B     B1
7     B     B1
8     B     B1
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • I believe best practice is to avoid regular expressions when possible. – tblznbits Oct 19 '17 at 21:39
  • 2
    I disagree. Given that `second` is a string, regexp is the way to go. It allows for way more flexibility than substring extractions based on coordinates... – Maurits Evers Oct 19 '17 at 21:42