0

I use elseif to sanitise data in a real world data base that is subjected to typing errors.

Lets say I want to sanitise a value of X which I know can't be above 100 in real world situations so I just want to turn everything above 100 to NA values not to be included in the analysis.

So I would do:

df$x <- ifelse(df$x > 100, NA, df$x)

this turns all values above 100 to NA and keeps the other ones

This feels quite cumbersome and makes the code unreadable when I use the real variable names which are quite long.

Is there any shorter way to do what I am trying to perform?

Thanks!

Is there any way in r to shorten this pea

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Edi Itelman
  • 423
  • 5
  • 14
  • 1
    `ifelse` itself is a shorthand for 'if-else'. I dont see a problem for readability either. If you want an alternative, there possibly would be many (evaluating to anything that works as a ternary operator) but ifelse is the best for its use case. –  Jan 18 '20 at 08:38
  • @chmod777 `ifelse` is not just a shorthand for `if-else`. The former is *vectorized*, the latter is not. See, for instance, [this post](https://stackoverflow.com/questions/17252905/else-if-vs-ifelse). – Rui Barradas Jan 18 '20 at 08:59
  • I know`ifelse` is a vector equivalent of `if-else`, but what I was hinting at was just that it is shorter than if-else blocks in my first line of my previous comment. –  Jan 18 '20 at 10:23

4 Answers4

4

The simplest way I am aware of is with function is.na<-.

is.na(df$x) <- df$x > 100

Explanation.

Function is.na<- is a generic function defined in file
src/library/base/R/is.R as

`is.na<-` <- function(x, value) UseMethod("is.na<-")

One method is defined in the file, the default method.

`is.na<-.default` <- function(x, value)
{
    x[value] <- NA
    x
}

This is what S3's method dispatch mechanism calls in the answer's code line. An alternative way of calling it is the functional form.

`is.na<-`(df$x, df$x > 100)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • 1
    @AntonySamuelB There are two `is.na` functions. This one assigns (hence the arrow `<-`) `NA`'s to the vector elements indexed by the second argument. In this case it's a logical index. It could also be something like `x <- 1:10; is.na(x) <- c(2, 6, 9)`, a integer indices vector. – Rui Barradas Jan 18 '20 at 10:22
  • Thats a nice alternative :) –  Jan 18 '20 at 10:24
1

Use data.table

setDT(df) df[x > 100, x := NA]

If the operation is to be applied for several columns,

column.names <- names(df)[names(df) %in% column.names] for(i.col in column.names){ set(df, which(df[[i.col]] > 100), i.col, NA) }

0

Try This answer will help.

df <- data.frame('X'=c(1,2,3,4,NA,100,101,102))

df$X <- as.numeric(df$X)

df$X <- ifelse((is.na(df$X) | df$X >100),NA,df$X)
Tushar Lad
  • 490
  • 1
  • 4
  • 17
0

You can use the column index instead of column names then.

col <- which(names(df) == 'x')
df[[col]] <- df[[col]] * c(1, NA)[(df[[col]]  > 100) + 1]

Or

df[[col]] <- with(df, replace(df[[col]], df[[col]] > 100, NA))

So here you use column name only once.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213