3

I have an existing matrix and I want to replace some of the existing values by NA's in a random uniform way.

I tried to use the following, but it only replaced 392 values with NA, not 452 as I expected. What am I doing wrong?

N <- 452

ind1 <- (runif(N,2,length(macro_complet$Sod)))

macro_complet$Sod[ind1] <- NA

summary(macro_complet$Sod)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
  0.3222   0.9138   1.0790   1.1360   1.3010   2.8610 392.0000 

My data looks like this

> str(macro_complet)
'data.frame':   1504 obs. of  26 variables:
 $ Sod                     : num  8.6 13.1 12 13.8 12.9 10 7 14.8 11.3 4.9 ...
 $ Azo                     : num  2 1.7 2.2 1.9 1.89 1.61 1.72 2.1 1.63 2 ...
 $ Cal                     : num  26 28.1 24 28.5 24.5 24 17.4 26.6 24.8 10.5 ...
 $ Bic                     : num  72 82 81 84 77 68 66 81 70 37.8 ...
 $ DBO                     : num  3 2.2 3 2.7 3.3 3 3.2 2.9 2.8 2 ...
 $ AzoK                    : num  0.7 0.7 0.9 0.8 0.7 0.7 0.7 0.9 0.7 0.7 ...
 $ Orho                    : num  0.3 0.2 0.31 0.19 0.19 0.2 0.16 0.24 0.2 0.01 ...
 $ Ammo                    : num  0.12 0.16 0.15 0.13 0.19 0.22 0.19 0.16 0.17 0.08 ...
 $ Carb                    : num  0.3 0.3 2 0.3 0.3 0.3 0.3 0.3 0.3 0.5 ...
 $ Ox                      : num  10.2 9.7 9.8 9.6 9.7 9.1 9.1 8.1 9.7 10.6 ...
 $ Mag                     : num  5.5 6.5 6.3 7 6.4 5.1 6 6.7 5.7 2 ...
 $ Nit                     : num  4.2 4.7 5.7 4.6 4.2 3.5 4.9 4.5 4.2 2.8 ...
 $ Matsu                   : num  17 9 24 15 17 19 20 19 13 3.9 ...
 $ Tp                      : num  10.5 9.7 11.9 12 12.9 11.2 12.8 13.7 11.5 10.6 ...
 $ Co                      : num  3 3.45 3.3 3.54 2.7 2.7 3.3 3.49 2.8 1.8 ...
 $ Ch                      : num  17 24 22 28 25 19 13 28 23 6.4 ...
 $ Cu                      : num  25 15 20 20 15 20 15 15 20 15 ...
 $ Po                      : num  3.5 3.8 4 3.6 3.8 3.7 3 4.2 3.7 0.4 ...
 $ Ph                      : num  0.2 0.17 0.2 0.14 0.18 0.2 0.17 0.17 0.17 0.01 ...
 $ Cnd                     : int  226 275 285 295 272 225 267 283 251 61 ...
 $ Txs                     : num  93 88 89 86 87 88 84 80 91 94 ...
 $ Niti                    : num  0.06 0.09 0.07 0.06 0.08 0.07 0.08 0.11 0.1 0.01 ...
 $ Dt                      : num  9 9.7 9 10.2 8 8 7 9.4 8.5 3 ...
 $ H                       : num  7.6 7.7 7.6 7.7 7.55 7.4 7.3 7.5 7.5 7.6 ...
 $ Dco                     : int  17 12 15 13 15 20 16 14 12 7 ...
 $ Sf                      : num  22 20.5 18 22.2 22.1 21 11.6 21.7 21.9 6.8 ...

I also tried to do this for only a single variable, but got the same result.

I converted my data frame into a matrix using

as.matrix(n1)

then I replaced some values for only one variable

N <- 300

ind <- (runif(N,1,length(n1$Sodium)))

n1$Sodium[ind] <- NA

However, using summary() I observed that only 262 values were replaced instead of 300 as expected. What am I doing wrong?

summary(n1$Sodium)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
  0.3222   0.8976   1.0790   1.1320   1.3010   2.8610 262.0000
divibisan
  • 11,659
  • 11
  • 40
  • 58
Eva Serrano
  • 33
  • 1
  • 4

3 Answers3

7

Try this. This will sample your matrix uniformly without replacement (so the same value is not chosen and replaced twice). If you want some other distribution, you can modify the weights using the prob argument (see ?sample)

vec <- matrix(1:25, nrow = 5)
vec[sample(1:length(vec), 4, replace = FALSE)] <- NA

vec
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA    6   NA   16   NA
[2,]   NA    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25
divibisan
  • 11,659
  • 11
  • 40
  • 58
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • thanks, but does this generates random uniform NA's, i really need that the NA's replacing values keep a uniform way – Eva Serrano Apr 09 '13 at 16:03
  • this is not working when i use a bigger matrix (1000 rows and 26 colums) insted of replacing only 4 values it replaces all the values of 4 colums into NA's, how can i change this in order to control the number of values to be replaced? – Eva Serrano Apr 09 '13 at 16:26
  • 3
    @EvaSerrano I would rather not guess at what's happening. Edit your question to include the code you're using, please. – Roman Luštrik Apr 09 '13 at 17:07
  • so i tried using as macro_complet[sample(length(macro_complet$Sodium), 250, replace =FALSE)] <- NA – Eva Serrano Apr 10 '13 at 07:42
  • @EvaSerrano Can you edit your question and show us what `macro_complet` looks like (using `str()`)? – Roman Luštrik Apr 10 '13 at 15:56
  • @EvaSerrano yes! I think it's not working because you have a data.frame. Method for replacing values in a data.frame differs from that of a matrix. You could probably get away with coercing to matrix, replace values as answered here and coerce back to data.frame. – Roman Luštrik Apr 11 '13 at 08:26
  • coercing my data frame into matrix using `as.matrix(macro_complet)`function and replacing values as suggested in here is not working for me,i'm still having a different number of replaced values, any suggestions? – Eva Serrano Apr 11 '13 at 09:34
  • @EvaSerrano Can you make a small reproducible example and demonstrate what you're doing? It's hard to tell what's going on from "it's not working". This post may help you along the way: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Roman Luštrik Apr 11 '13 at 09:39
3

you must apply runif in the right spot, which is the index to vec. (The way you have it now, you are asking R to draw random numbers from a uniform distribution between NA and NA, which of course does not make sense and so it gives you back NaNs)

Try instead:

        N  <-  5                                   # the number of random values to replace
      inds <- round ( runif(N, 1, length(vec)) )   # draw random values from [1, length(vec)]
 vec[inds] <- NA                                   # use the random values as indicies to vec, for which to replace

Note that it is not necessary to use round(.) since [[ will accept numerics, but they will all be rounded down by default, which is just slightly less than a uniform dist.

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • appllying this for one specific colum it replace 336 values of the 350 i want to replace any idea of what am i doing wrong? `N <- 375` `inds <- (runif(N,1,length(vec$Sodium)))` `vec$Sodium[inds] <- NA ` – Eva Serrano Apr 10 '13 at 07:57
  • Eva, please see @Roman-Lustrik 's comment. You need to edit your question with your code and sample data. Alternatively, you can open a new question as a follow up to this question. – Ricardo Saporta Apr 10 '13 at 14:09
1

We could use

vec[sample(seq_along(vec), 4, replace = FALSE)] <- NA
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    For my purposes, this solution worked best for me, because it guaranteed that I got exactly the right number of randomly-selected elements. – WinzarH Dec 01 '21 at 05:04