Filling in missing values with duplicate name?

Question

I apologize if this is a repeat, I have looked through many answers but I haven't found one that really hits on what I am trying to do.

I have a dataset that contains duplicate names but doesn't not necessarily have an assigned account number. For example:

df <- data.frame(Name = c("Hilton", "Comcast", "Comcast", "Comcast", "Froyos", "Froyos", "BigFive"), 
                 Account = c("123", "456", NA, NA, "789", NA, "111"))
df
     Name Account
1  Hilton     123
2 Comcast     456
3 Comcast    <NA>
4 Comcast    <NA>
5  Froyos     789
6  Froyos    <NA>
7 BigFive     111

And I want to match the name to fill in the associated account number so I would look like this:

     Name Account
1  Hilton     123
2 Comcast     456
3 Comcast     456
4 Comcast     456
5  Froyos     789
6  Froyos     789
7 BigFive     111

Ensuring all the classes are the same, I've tried making a separate list and using ifelse and %in%, but it is not assigning the correct value to the name. My code looks like:

library(dplyr)

df$Name <- as.character(df$Name)
df2$Name <- as.character(df2$Name)
df$Account <- as.numeric(as.character(df$Account))
df2$Account <- as.numeric(as.character(df2$Account))

df2 <- df %>% 
  filter(as.numeric(Account) > 0)

df3 <- within(df, {New = ifelse(df$Name %in% df2$Name,
                                          df2$Account, NA)})

I feel like this should be fairly straight forward, but I'm having a hard time knowing how to word the problem so it does it correctly. Any help or direction would be greatly appreciated.

Will there only be one non-`NA` value for each group of `Name`? — markus, Nov 22 '19 at 22:03
A `data.table` way: `library(data.table); setDT(df); df[, Account := df[!is.na(Account)][df, on=.(Name), x.Account]]`. For a reference see [Which data.table syntax for left join (one column) to prefer](https://stackoverflow.com/questions/54312225/which-data-table-syntax-for-left-join-one-column-to-prefer) — markus, Nov 22 '19 at 22:21

score 1 · Accepted Answer · answered Nov 22 '19 at 21:55

Note stringsAsFactors = F

df <- data.frame(Name = c("Hilton", "Comcast", "Comcast", "Comcast", "Froyos", "Froyos", "BigFive"), 
                 Account = c("123", "456", NA, NA, "789", NA, "111"), stringsAsFactors = F)



df %>% group_by(Name) %>% mutate(Account = max(Account, na.rm = T)) %>% ungroup()

Filling in missing values with duplicate name?

1 Answers1