I have big data frame and I want to filter columns of it. Basically I want to keep the columns whose entries are larger than k in N% of the rows. Can someone help me to do this in R ? I'm new in R.
Asked
Active
Viewed 275 times
-2
-
Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Jun 04 '16 at 17:17
1 Answers
3
Its good to have a reproducible example.
I will use the data diamonds
as an illustration
data(diamonds)
keepCol <- function(df, K, N){
# df: data.fram
# K: Threshold value
# N: % criteria
# how many rows are in the data.frame
cntRows <- dim(df)[1]
# how many should fullfill the criteria (N%)
N <- N*cntRows
# Get the class of each column
colClass <- lapply(df, class) %>% unlist
# keep those that are numeric
colNames <- names(colClass[colClass=="numeric"])
df <- df[, colNames]
# How many case of each numeric column fullfill your criteria (are > then K)
keepCol <- (apply(df, 2, function(x) sum(x>K))>N)
# Keep only those columns
df <- df[, names(keepCol[keepCol==T])]
return(df)
}
keepCol(diamonds, K=4, N=0.2)

dimitris_ps
- 5,849
- 3
- 29
- 55