Repeating same 3 commands for 238 columns, df$X1 through df$X238

Question

Can you help me simplify this? I need to repeat this 238 times for df$X1 through df$X238:

df$X1[is.na(df$X1)] <- NA
df$X1[df$X1 == ‘N/A’] <- NA
df$X1[df$X1 == 0] <- NA

df$X2[is.na(df$X2)]<- NA
df$X2[df$X2 == ‘N/A’] <- NA
df$X2[df$X2 == 0] <- NA

...df$X238

What you have are different columns in a same data frame or different data frames (i.e., multiple objects)? — Carlos Eduardo Lagosta, Jun 24 '19 at 21:32
`X1[is.na(X1)] <- NA` does nothing, `is.na(X1)` was already `TRUE`. — Rui Barradas, Jun 24 '19 at 21:35
Hi and thank you Rui Barradas and Carlos Eduardo Lagosta. I just updated it to show that they are different columns in the same data frame. My apologies for the confusion. X1 and X2 should have been df$X1 and df$X2. Does this change things? — Ray, Jun 25 '19 at 02:15

score 0 · Answer 1 · answered Jun 24 '19 at 21:39

0

As Rui Barradas pointed out the first assignment doesn't do anything. The rest could be handled like this:

df <- data.frame(
  X1 = c(0,1,3,NA),
  X2 = c(1, 'N/A', 3,3)
)

NA_subst <- function(x) {
  x[x == 'N/A'] <- NA
  x[x == 0] <- NA
  return(x)
}

as.data.frame(lapply(df, NA_subst))

answered Jun 24 '19 at 21:39

eastclintw00d

2,250
1
9
18

Hi eastclintw00d. Thank you. Does this still apply now that I changed my code? just updated it to show that they are different columns in the same data frame. My apologies for the confusion. X1 and X2 should have been df$X1 and df$X2 – Ray Jun 25 '19 at 02:16
Yes, it should still work fine. – eastclintw00d Jun 25 '19 at 05:12
Apologies because I am new. Any chance you can provide the new code given my changes? It appears I still have to list df$X1 through df$X238 in the first part of your code. – Ray Jun 25 '19 at 12:41
I don't think that you have to change anything. A `data.frame` is just a special type of `list`. So when you have a dataframe `df` with an arbitrary number of columns the code above should still work. – eastclintw00d Jun 25 '19 at 17:58

score 0 · Answer 2 · answered Jun 24 '19 at 23:03

Your question is about different columns, but your example is with different objects. I will answer both cases, but first a reproducible example:

set.seed(123)
X <- data.frame(
  a = sample(c(0:2, NA, 'N/A'), 4),
  b = sample(c(0:2, NA, 'N/A'), 4)
)
X -> Y1 -> Y2

> X
     a   b
1    1 N/A
2 <NA>   0
3  N/A   1
4    2   2

For all the columns in a data frame:

X[X == 0 | X == 'N/A'] <- NA
# `is.na(x) <- NA` is redundant

> X
     a    b
1    1 <NA>
2 <NA> <NA>
3 <NA>    1
4    2    2

For multiple data frames

If instead you need to repeat operations in multiple data frames, it's advised to put then in a list:

df.list <- mget(objects(pattern = 'Y'))

> lapply(df.list, function(x) replace(x, x == 0 | x == 'N/A', NA))
$Y1
    a    b
1    1 <NA>
2 <NA> <NA>
3 <NA>    1
4    2    2

$Y2
    a    b
1    1 <NA>
2 <NA> <NA>
3 <NA>    1
4    2    2

If you need to convert your list back to single objects, you can use:

list2env(df.list, .GlobalEnv) # this will overwrite objects with same names

Repeating same 3 commands for 238 columns, df$X1 through df$X238

2 Answers2

For all the columns in a data frame:

For multiple data frames