1

I have to apologize in advance if the question is very basic as I am still new to R. I have tried to look on stackoverflow for similar questions, but I still can't resolve the problem that I am facing.

I am currently working on a large dataset X. What I am trying to do is pretty simple. I want to replace all NAs in selected columns (non consecutive columns) with "no".

I firstly have created a variable including all the columns that I want to modify. For instance, if I want to modify the NAs in columns named "m","l" and "h", I wrote the following:

modify <- c("m","l","h") 

for (i in 1:length(modify))
  column <- modify[i] 
  X$column <- as.character(X$column) #X is my dataframe
  X$column %>% replace_na("no")

This loop returned the output only for the "m" column, which is the first variable in my modify variable. However, even after generating the output after the loop, when I tried to check X$m, nothing has changed in my original dataset.

I also tried to create a function, which is very similar to the loop. Even though no error message was generated, it didn't work as I do not know what the return value should be.

Why can't the loop being applied to my entire dataset while the individual steps in the loop work?

Thank you so so much for your help!

  • `X$column %>% replace_na("no")` produces output but does not change X$column. @scrameri's approach is more in keeping with the principles of `dplyr`, which you might be using for the `magrittr` pipe. – Jon Spring Sep 22 '21 at 00:10

4 Answers4

1

This might help, and was among one of the answers here (but slightly different here using all_of():

library(tidyverse)
df <- tibble(x = c(1, 2, NA), y = c("a", NA, "b"))
df
#> # A tibble: 3 × 2
#>       x y    
#>   <dbl> <chr>
#> 1     1 a    
#> 2     2 <NA> 
#> 3    NA b

modify <- c("x","y")

df %>%
  mutate(
    across(all_of(modify), ~replace_na(.x, 0))
  )
#> # A tibble: 3 × 2
#>       x y    
#>   <dbl> <chr>
#> 1     1 a    
#> 2     2 0    
#> 3     0 b

Created on 2021-09-22 by the reprex package (v2.0.1)

scrameri
  • 667
  • 2
  • 12
1

Here's a base R approach modifying data from @scrameri.

df <- data.frame(x = c(1, 2, NA), y = c("a", NA, "b"), c = c(1, NA, 5))
modify <- c('x', 'y')
df[modify][is.na(df[modify])] <- 'no'
df

#   x  y  c
#1  1  a  1
#2  2 no NA
#3 no  b  5
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you so much Ronak. Yesterday I tried with something similar (in my mind) ``` bvFTD.first[behaviour.diagnosis] <- bvFTD.first[behaviour.diagnosis] %>% replace_na("no") ``` But it did not work. Now with the base R approach it is definitely clearer, thank you ;) – Helen Andrews Sep 22 '21 at 11:29
0

I'm going to fix your code with as few changes as possible, so you can learn.

There are two big problems. First, the for loop needs to have curly braces {} around the lines you want to loop over. Second, if you want to reference variables in a data frame dynamically, you can't use the $ operator. You have to use double brackets [[]].

library(tidyr)
                               
X <- data.frame(m = c(1, 2, NA), l = c("a", NA, "b"), h = c(1, NA, 5))                              
                               
 modify <- c("m","l","h") 
 
 for (i in seq_along(modify)) {
   column <- modify[i] 
   X[[column]] <- as.character(X[[column]]) #X is my dataframe
   X[[column]] <- X[[column]] %>% replace_na("no")
 }
 
 X
                               
 #    m  l  h
 # 1  1  a  1
 # 2  2 no no
 # 3 no  b  5 

You can do what you were trying to do much more efficiently, as shown in the other answers. But I wanted to show you how to do it the way you were trying to correct your understanding of for loops and the subset operator. These are basic things that everyone should understand when you are first learning R.

You might want to go through a beginners tutorial to solidify your understanding. I used tutorialspoint when I was first learning and found it useful.

Dharman
  • 30,962
  • 25
  • 85
  • 135
David J. Bosak
  • 1,386
  • 12
  • 22
  • Thank you so much Dharman and David! I did not know the difference between [] and [[]] before, thank you so much for pointing it out. And in this way I had definitely learnt! I had been stuck with these few lines for hours yesterday, thank you! – Helen Andrews Sep 22 '21 at 07:49
0

We could do this efficiently with set from data.table

library(data.table)
setDT(X)
for(nm in modify) {
    set(X, i = NULL, j= nm, value = as.character(X[[nm]]))
    set(X, i = which(is.na(X[[nm]])), j = nm, value = 'no')
}

-output

> X
    m  l  h  i
1:  1  a  1 NA
2:  2 no no  5
3: no  b  5  6

data

X <- data.frame(m = c(1, 2, NA), l = c("a", NA, "b"), 
     h = c(1, NA, 5), i = c(NA, 5, 6))
modify <- c("m","l","h")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you so much akrun. I have just finished reading the manual for data.table, it is such a powerful package! Thank you! – Helen Andrews Sep 22 '21 at 11:42