0

Can you help me simplify this? I need to repeat this 238 times for df$X1 through df$X238:

df$X1[is.na(df$X1)] <- NA
df$X1[df$X1 == ‘N/A’] <- NA
df$X1[df$X1 == 0] <- NA

df$X2[is.na(df$X2)]<- NA
df$X2[df$X2 == ‘N/A’] <- NA
df$X2[df$X2 == 0] <- NA

...df$X238
Ray
  • 25
  • 4
  • What you have are different columns in a same data frame or different data frames (i.e., multiple objects)? – Carlos Eduardo Lagosta Jun 24 '19 at 21:32
  • 1
    `X1[is.na(X1)] <- NA` does nothing, `is.na(X1)` was already `TRUE`. – Rui Barradas Jun 24 '19 at 21:35
  • Hi and thank you Rui Barradas and Carlos Eduardo Lagosta. I just updated it to show that they are different columns in the same data frame. My apologies for the confusion. X1 and X2 should have been df$X1 and df$X2. Does this change things? – Ray Jun 25 '19 at 02:15

2 Answers2

0

As Rui Barradas pointed out the first assignment doesn't do anything. The rest could be handled like this:

df <- data.frame(
  X1 = c(0,1,3,NA),
  X2 = c(1, 'N/A', 3,3)
)

NA_subst <- function(x) {
  x[x == 'N/A'] <- NA
  x[x == 0] <- NA
  return(x)
}

as.data.frame(lapply(df, NA_subst))

eastclintw00d
  • 2,250
  • 1
  • 9
  • 18
  • Hi eastclintw00d. Thank you. Does this still apply now that I changed my code? just updated it to show that they are different columns in the same data frame. My apologies for the confusion. X1 and X2 should have been df$X1 and df$X2 – Ray Jun 25 '19 at 02:16
  • Yes, it should still work fine. – eastclintw00d Jun 25 '19 at 05:12
  • Apologies because I am new. Any chance you can provide the new code given my changes? It appears I still have to list df$X1 through df$X238 in the first part of your code. – Ray Jun 25 '19 at 12:41
  • I don't think that you have to change anything. A `data.frame` is just a special type of `list`. So when you have a dataframe `df` with an arbitrary number of columns the code above should still work. – eastclintw00d Jun 25 '19 at 17:58
0

Your question is about different columns, but your example is with different objects. I will answer both cases, but first a reproducible example:

set.seed(123)
X <- data.frame(
  a = sample(c(0:2, NA, 'N/A'), 4),
  b = sample(c(0:2, NA, 'N/A'), 4)
)
X -> Y1 -> Y2

> X
     a   b
1    1 N/A
2 <NA>   0
3  N/A   1
4    2   2

For all the columns in a data frame:

X[X == 0 | X == 'N/A'] <- NA
# `is.na(x) <- NA` is redundant

> X
     a    b
1    1 <NA>
2 <NA> <NA>
3 <NA>    1
4    2    2

For multiple data frames

If instead you need to repeat operations in multiple data frames, it's advised to put then in a list:

df.list <- mget(objects(pattern = 'Y'))

> lapply(df.list, function(x) replace(x, x == 0 | x == 'N/A', NA))
$Y1
    a    b
1    1 <NA>
2 <NA> <NA>
3 <NA>    1
4    2    2

$Y2
    a    b
1    1 <NA>
2 <NA> <NA>
3 <NA>    1
4    2    2

If you need to convert your list back to single objects, you can use:

list2env(df.list, .GlobalEnv) # this will overwrite objects with same names