0

I have a dataset where I need to replace certain values in different columns with NA. The values that should be NA change in some columns due to the data being from a survey with different amount of possible answers. I know how to change the values like this:

ayw <- ayw %>% 
  mutate(ExperienceWMentorSRP_01 = ifelse(ExperienceWMentorSRP_01 == 5, NA, ExperienceWMentorSRP_01))

or like this:

ayw$ExperienceWMentorSRP_01 <- replace(ayw$ExperienceWMentorSRP_01, ayw$ExperienceWMentorSRP_01 ==5, NA)

So, there are some sections where I repeat this line of code (just different number after the underscore) for more than a dozen times. I feel like there is a more efficient way of doing this without having to change the column names manually everytime.

I tried making a function:

na.fun <- function(dataset, column, nanumber){
              dataset$column[dataset$column == nanumber] = NA
          }

na.fun(ayw, ExperienceWMentorSRP_01, 5)

but I get the following error:

! Assigned data <lgl> must be compatible with existing data. ✖ Existing data has 155 rows. ✖ Assigned data has 0 rows.

I think I might be going in the wrong direction. I would still need to write it n times anyway unless I make a loop work. And I've tried doing it, as well. Like this:

  for (row in mentorset){  #used `select() %>%` to make this subset(mentorset)so I didn't mess anything else
      for (col in row){
         ifelse(col == 5, NA, col)
      }
  }

But when I try to save the result each time using mentorset <- and printing the result, it ends up with a data frame with only one value, e. g. '4'. I assume it saves the last iteration of the loop, and that's the reason.

How could I solve this problem? Am I better off just writing it manually?

Jaime D.
  • 3
  • 2

1 Answers1

1

dataset$column is looking for a column literally named column, not named for the symbol you passed to it (see Dynamically select data frame columns using $ and a character value and The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe). You're getting into the realm of non-standard-evaluation (NSE), which can be done but is often fraught with peril and difficult debugging ... though if you must, look at the tidyselect package and http://adv-r.had.co.nz/Computing-on-the-language.html.

FYI, the lgl error you're receiving is because dataset$column is returning NULL, as in

mtcars$DOES_NOT_EXISTS
# NULL

If you can accept using quoted column names, as in na.fun(ayw, "Experience", 5), then try

na.fun <- function(dataset, column, nanumber){
  dataset[[ column ]][ dataset[[ column ]] == nanumber ] <- NA
  dataset
}

Note that your function appears to want to work by side-effect, hoping that the change will reside in the data outside of the function ... with few exceptions, this will not happen, R tends to be copy-on-write (not referential semantics). This means that the moment you change the values within the column, the dataset as seen inside na.fun is now a copy of ayw outside of the call, and therefore ayw is unchanged.

To fix this, we do two things: return the dataset inside the function (see my code above), and capture the results outside of it

ayw <- na.fun(ayw, "ExperienceWMentorSRP_01", 5)

(notice the quotes around the variable name)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Your code for the function worked as intended, thank you! Now if I wanted to do a loop with this function, is it possible? To illustrate what I mean: for (i in mentorset){ ayw <- na.fun(ayw, "MentorComp_[i]", 5) } I know it's wrong, but I just wanted to give an example of what I'm trying to do. – Jaime D. Feb 20 '23 at 22:34
  • `"MentorComp_[i]"` is just a string, it is not going to be indexed like I think you intend. You might do something like `for (cn in grep("MentorComp_", names(ayw), value=TRUE)) ayw <- na.fun(ayw, cn, 5)`, that assumes `5` for each column. – r2evans Feb 20 '23 at 22:44
  • Apologies for the late reply, but that's amazing, it worked! Thank you so much! I assume grep() was the key here? I looked it up briefly to see that it's about pattern matching, which makes a lot of sense, even though I don't fully understand how it works. I'm still relatively new to all R can do. Anyway, I really appreciate your help. – Jaime D. Feb 21 '23 at 21:26
  • `grep` is a "swiss army knife" of tools (along with `grepl`): `grep` can return integer indices from within the `x=` vector or it can return the values themselves with `value=TRUE`; there are definitely other ways to get at the same thing. – r2evans Feb 21 '23 at 21:56