So for a project I am working on, I have to make 5% of data in a data-filled csv file randomly change to NA. I am fine with methods that use either R or just Microsoft Excel functions. I am not too well versed in R, so I really don't know where to start.
Asked
Active
Viewed 23 times
0
-
Easier to help if you [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(yourdata)`, if that is not too large. We don't know what your "data-filled csv file" looks like or what needs to change - columns, rows, some combination? – neilfws Jul 18 '23 at 03:28
1 Answers
0
You could do something like this:
library(tidyverse)
df <- read_csv("mtcars.csv")
sample_NA <- function(x, pct_na = 5) {
sample(c(rep(x, 100-pct_na), rep(NA, pct_na)), 1)
}
# using base R
dfrandomised <- apply(df, 1:2, sample_NA)
# or using tidyverse
dfrandomised <- df %>%
rowwise() %>%
mutate(across(everything(), sample_NA)) %>%
ungroup()
# A tibble: 32 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 NA 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 NA 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 NA 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 NA 3.15 22.9 1 0 4 2
10 19.2 6 NA 123 3.92 3.44 18.3 NA 0 4 4
# ℹ 22 more rows
write_csv(dfrandomised, "mtcars.csv")

Mark
- 7,785
- 2
- 14
- 34