0

So for a project I am working on, I have to make 5% of data in a data-filled csv file randomly change to NA. I am fine with methods that use either R or just Microsoft Excel functions. I am not too well versed in R, so I really don't know where to start.

amahd
  • 67
  • 4
  • Easier to help if you [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(yourdata)`, if that is not too large. We don't know what your "data-filled csv file" looks like or what needs to change - columns, rows, some combination? – neilfws Jul 18 '23 at 03:28

1 Answers1

0

You could do something like this:

library(tidyverse)

df <- read_csv("mtcars.csv")

sample_NA <- function(x, pct_na = 5) {
  sample(c(rep(x, 100-pct_na), rep(NA, pct_na)), 1)
}

# using base R
dfrandomised <- apply(df, 1:2, sample_NA)

# or using tidyverse
dfrandomised <- df %>% 
  rowwise() %>%
  mutate(across(everything(), sample_NA)) %>%
  ungroup()

# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1    NA     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  NA       6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0    NA     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95 NA     3.15  22.9     1     0     4     2
10  19.2     6   NA    123  3.92  3.44  18.3    NA     0     4     4
# ℹ 22 more rows
    
write_csv(dfrandomised, "mtcars.csv")
Mark
  • 7,785
  • 2
  • 14
  • 34