1

I have a dataframe of 600x14 dimensions and I need to use the replace(.) command in R to replace certain values in a certain column to NA. These values are outliers that I am masking with NA. the name of the column is called Response.Size and the name of the dataframe is called mydata. The data points of interest I need to replace with NA are in rows 54, 146 and 239 and their corresponding values are 206952, 198146 and 135523 respectively.

This is my first time using R Studio so I am a bit confused. I have tried using the replace(.) command but can't seem to get it. Any help would be appreciated

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
Isaac
  • 13
  • 3
  • Hi Isaac! Welcome to StackOverflow. A couple of questions: 1. when you say "their corresponding values are 206952, 198146 and 135523 respectively", what do you mean by that? Are the values in the Response.Size column 206952, 198146 and 135523 or 54, 146 and 239 ? – Mark Jul 23 '23 at 03:53
  • Hi Mark. I mean the values are 206952, 198146 and 135523. 54, 146 and 239 are just their points out of 600 (i.e. 206952 is at 54/600 in the dataframe) – Isaac Jul 23 '23 at 03:57
  • ah cool, 54, 146 and 239 are the row numbers – Mark Jul 23 '23 at 03:59
  • 2. are you sure you want to replace outliers with NAs? Generally speaking removing outliers from your data (unless they genuinely were just incorrect measurements) is frowned upon in statistics – Mark Jul 23 '23 at 03:59
  • Rather than tell us your data, show us with a `dput` sample. See [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269/1422451). – Parfait Jul 23 '23 at 04:00
  • I'm just following my assignment instructions haha. I know it is probably not best practice. The answer below from jay.sf solved my question. Thanks for helping though – Isaac Jul 23 '23 at 04:10

2 Answers2

2

Try this.

> df
  X1     X2 X3
1  0      0  0
2  0      0  0
3  0 206952  0
4  0      0  0
5  0 198146  0
6  0      0  0
7  0 135523  0
> df$X2 <- replace(x=df$X2, list=df$X2 %in% c(206952, 198146, 135523), values=NA)
> df
  X1 X2 X3
1  0  0  0
2  0  0  0
3  0 NA  0
4  0  0  0
5  0 NA  0
6  0  0  0
7  0 NA  0

Data:

df <- structure(list(X1 = c(0, 0, 0, 0, 0, 0, 0), X2 = c(0, 0, 206952, 
0, 198146, 0, 135523), X3 = c(0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-7L), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    That works, thank you so much – Isaac Jul 23 '23 at 04:03
  • 1
    This is clunky since it requires you to manually enter the items to replace. Maybe instead `list = df$X2 < max_not_outlier` would be more general. – Carl Witthoft Jul 23 '23 at 16:01
  • @CarlWitthoft You are right for the case when you want to remove outliers below a certain threshold, but it was asked about how to "replace certain values" in a data frame. – jay.sf Jul 23 '23 at 16:10
  • @jay.sf Understood, but I find it disappointing when folks answer exactly what's asked instead of answering what really **should** have been asked. :-( – Carl Witthoft Jul 23 '23 at 18:10
  • @CarlWitthoft OP might have a vector of outliers, e.g. from `boxplot(df$X2)$out,` and first calculating min and max from that is what I would consider clunky. BTW, `replace(x, x < outlier_min | x > outlier_max, NA_real_)` would be the correct solution for that what you have in mind, but that isn't suitable here, because `outlier_min != min(outlier_vec)` and `outlier_max != max(outlier_vec)`. – jay.sf Jul 23 '23 at 18:37
1

There already is an accepted answer but here is another base R one but, though the question specifically asks for a replace solution, this is a solution with is.na<-. The function's RHS is a suitable index vector, in this case the logical vector returned by %in%.

The data comes from jay.sf's accepted answer

df <- structure(list(X1 = c(0, 0, 0, 0, 0, 0, 0), 
                     X2 = c(0, 0, 206952, 0, 198146, 0, 135523), 
                     X3 = c(0, 0, 0, 0, 0, 0, 0)), 
                row.names = c(NA, -7L), class = "data.frame")

is.na(df$X2) <- df$X2 %in% c(206952, 198146, 135523)
df
#>   X1 X2 X3
#> 1  0  0  0
#> 2  0  0  0
#> 3  0 NA  0
#> 4  0  0  0
#> 5  0 NA  0
#> 6  0  0  0
#> 7  0 NA  0

Created on 2023-07-23 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66