-2

I have big data frame and I want to filter columns of it. Basically I want to keep the columns whose entries are larger than k in N% of the rows. Can someone help me to do this in R ? I'm new in R.

Robin
  • 149
  • 7
  • Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Jun 04 '16 at 17:17

1 Answers1

3

Its good to have a reproducible example.

I will use the data diamonds as an illustration

data(diamonds)


keepCol <- function(df, K, N){
  # df: data.fram
  # K: Threshold value
  # N: % criteria

 # how many rows are in the data.frame
 cntRows <- dim(df)[1]
 # how many should fullfill the criteria (N%)
 N <- N*cntRows

 # Get the class of each column
 colClass <- lapply(df, class) %>% unlist

 # keep those that are numeric
 colNames <- names(colClass[colClass=="numeric"])
 df <- df[, colNames]

 # How many case of each numeric column fullfill your criteria (are > then K)
 keepCol <- (apply(df, 2, function(x) sum(x>K))>N)

 # Keep only those columns
 df <- df[, names(keepCol[keepCol==T])]

 return(df)

}

keepCol(diamonds, K=4, N=0.2)
dimitris_ps
  • 5,849
  • 3
  • 29
  • 55