1

I found sort of the reverse question here: R: Replace multiple values in multiple columns of dataframes with NA

But I couldn't make it work with my data. In my case, I want to find the NA's and replace them with the value from another column.

I have a dataset dta1 in which there are 2493 variables I am interested in manipulating. Aside from these 2493 variables there's a column var_fill. When any of the columns named in vars is NA I want to fill it in with the value from var_fill. I tried reverse engineering the solution posted above but it gives me multiple warnings of:

1: In `[<-.factor`(`*tmp*`, list, value = structure(c(16946L,  ... :   invalid factor level, NA generated  
2: In x[...] <- m :   number of items to replace is not a multiple of replacement length

And also just doesn't work.

vars <- sprintf("var%0.4d",seq(1:2493))

dta1[vars] <- lapply(dta1[vars], function(x) replace(x,is.na(x), dta1$var_fill) )

I apologize but because of the size of this data I couldn't generate a full reproducible dataset so I heavily subsetted it but I am working with about 3000 columns and 240K rows of data.

Here's the data: https://drive.google.com/file/d/1oj_nhd99ftgN1Bh930_IRQftLACR2FO9/view?usp=sharing

It's too big to post even though it's only 10 people.

JeniFav
  • 113
  • 1
  • 9
  • have you tried `dta1[vars] <- lapply(dta1[vars], function(x) ifelse(is.na(x), dta1$var_fill, x))` instead? you'll need to coerce the factor variables to character to avoid those warnings, you can do that when you read in the data – rawr Aug 24 '21 at 23:06
  • Do you want to replace all NA row values in that row with the var_fill column value at that row, or something more exotic? – Chris Aug 24 '21 at 23:09
  • You don't need to share with us all the data. A sample dataframe with 10 rows and 4-5 columns along with expected output would be enough to help us understand the problem. – Ronak Shah Aug 25 '21 at 03:10
  • @rawr, I tried that but it did a couple of strange things so I went with the option below but thanks for responding! – JeniFav Aug 25 '21 at 17:41

1 Answers1

1

Turn the columns to characters and replace the NA values with the corresponding var_fill value.

dta1$var_fill <- as.character(dta1$var_fill)

dta1[vars] <- lapply(dta1[vars], function(x) {
                      x <- as.character(x)
                      x[is.na(x)] <- dta1$var_fill[is.na(x)]
                      x
                })

In dplyr, you can use coalesce.

library(dplyr)
dta1 <- dta1 %>% mutate(across(all_of(vars), ~coalesce(., var_fill)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This did it! Except it worked better to not convert var_fill to character. When I did that it did some strange things to some of them but leaving at numeric worked great, thanks! – JeniFav Aug 25 '21 at 17:41