1

This question is quiet related to the question asked in the following page of stackoverflow:Ignoring values or NAs in the sample function

Question: I have a matrix K of (say 1528) rows and many (say 1000) columns. There are NAs in many of the entries. From each column I want to sample 40 elements in the following way: I randomly sample from the non NA elements if the total of such elements in the column is not less than 40. if they are less than 40 (say=k), there will be all the k elements and 40-k NAs. I have tried the following code:

mysample <- function(x){
  if(sum(is.na(x))>1488){
    sum(x[!is.na(x)])
    return(c(x[!is.na(x)],rep(NA,40-sum(x[!is.na(x)]))))
  }
  return(sample(x[!is.na(x)],40))
}

J=apply(K, 2, mysample)

In the fourth line, it is showing invalid 'times' argument in the repeat. Can anyone make the code workable (I want to include the NAs since I want to produce a 40X1000 matrix out of it)

Community
  • 1
  • 1
  • 1
    Sorry: found the answer myself: mysample <- function(x){ if(sum(is.na(x))>1488){ return(c(x[!is.na(x)],rep(NA,40))[1:40]) } return(sample(x[!is.na(x)],40)) } J=apply(K, 2, mysample) Thanks to all the previous discussions in the forum. My code was a direct extension of them – Sourav Sarkar Jul 16 '16 at 14:35

1 Answers1

2

In your rep function, the time parameter should be 40 - length_of_nona which should be 40 - sum(!is.na(x)). Didn't test, but I think that will fix the problem:

mysample <- function(x){
  if(sum(is.na(x))>1488){
    # sum(x[!is.na(x)])
    return(c(x[!is.na(x)],rep(NA,40-sum(!is.na(x)))))
                                        ^        ^
  }
  return(sample(x[!is.na(x)],40))
}

J=apply(K, 2, mysample)
Psidom
  • 209,562
  • 33
  • 339
  • 356