Clean data using function that calls variable names

Question

I'm trying to compose a function that cleans my variables, replacing "*" with "1" and NAs with "0". I can do this easily with a ifelse, but I wanted it to be clean and use functional programming, but I clearly am not there yet...

An example database is:

db <- data.frame(
  name = c("Abel", "Abner", "Bianca", "Pedro", "Lucas"),
  scholarship1 = c("*", "*", "*", "*", NA),
  scholarship2 = c("*", NA, NA, "*", "*"))
)

My function is something like this:

Dichotomizer <- function(database, variable) {
  variable <- enquo(variable)
  database$variable <- ifelse(
    is.na(database$variable),
    0,
    1
  )
}

But it obviously doesn't work, and I can't find out why... I tried using eval and substitute, but it still incurs in errors.

I appreciate any inputs to solve my problem. Thanks.

Assume you're looking for something like this? `Dichotomizer = function(database, variable) { database[, variable] = ifelse(is.na(database[, variable]), '0', ifelse(database[, variable] == '*', '1', database[, variable])) database }` then call `Dichotomizer(db, 'scholarship1')` — Sean Lin, May 08 '18 at 15:41
I'm not sure what you mean by "use functional programming". Do you mean you want to use Non-Standard Evaluation (NSE)? How do you want to call your function? Standard Evaluation (SE) would be easy (no need for `eval`, `substitute`, `enquo`...) if you use a quoted column name, e.g., `Dichotomizer(db, "scholarship1")`. Sean's comment above shows a SE solution. NSE would be required for an unquoted column name `Dichotomizer(db, scholarship1)`. — Gregor Thomas, May 08 '18 at 15:41
Also, never just say "doesn't work" - describe how it doesn't work. If you get errors, post the error messages. Error messages are super helpful. If you don't get errors but get an unexpected result, describe it and explain why it is wrong. — Gregor Thomas, May 08 '18 at 15:44
@Gregor The "functional programming" part was what I thought I would need... But I see that it isn't needed. And I didn't put the error messages because I knew I wasn't anywhere near a good resolution to my problem, so I chose to omit that. — GVianaF, May 08 '18 at 15:55

score 0 · Answer 1 · answered May 08 '18 at 15:52

Since your example function was attribute by attribute, the following is a working example:

db <- data.frame(
    name = c("a", "b", "c", "d", "e"),
    scholarship1 = c("*", "*", "*", "*", NA),
    scholarship2 = c("*", NA, NA, "*", "*"))

dichotomizer <- function(database, variable) {
    copy <- rep(0, nrow(database[variable]))
    copy[which(database[variable] == "*")] = 1
    database[variable] <- copy
    return(database)
}

new_db <- dichotomizer(db, "scholarship1")
final_db <- dichotomizer(new_db, "scholarship2")

Clean data using function that calls variable names

1 Answers1