-5

I have a data frame with numerical columns in R. I want to see how many values in each column of a data frame exceeds some threshold. (e.g standard values more than +-2.5) Here is the output I want to display

Output

What function or what combination of functions could I use to produce similar results, assuming all the columns in my dataframe are numerical?

thanks in advance :)

Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
Neil
  • 7,937
  • 22
  • 87
  • 145
  • Please, try to do *some* research on your own. A quick search on Google provided some great examples: [one](http://www.statmethods.net/management/userfunctions.html), [two](http://www.r-bloggers.com/how-to-write-and-debug-an-r-function/), and [three](http://www.ats.ucla.edu/stat/r/library/intro_function.htm). Also, this is a great forum for questions on code you have actually attempted, so please tell us what you *have done so far* and what is not working in it. Last, please read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – r2evans Mar 25 '15 at 05:44

2 Answers2

5

This is quite easily done with lapply:

# Generate sample data (10 columns x 100 rows) normally distributed around 0
my.df <- as.data.frame(matrix(rnorm(n=1000), ncol=10))

# Get the line numbers, for each column in the df
lapply(my.df, function(x) which(abs(x) > 2.5))

# $V1
# integer(0)
# 
# $V2
# [1] 29 69
# 
# $V3
# [1] 85
# 
# $V4
# [1] 100
# 
# $V5
# [1] 11 40
# 
# $V6
# [1] 89
# 
# $V7
# [1] 67
# 
# $V8
# [1] 49 68
# 
# $V9
# integer(0)
# 
# $V10
# [1]  7 27

To get a formatting close to what you have given in your question, ExperimenteR kindly suggested this:

library(data.table)
setDT(my.df)[, list(lapply(.SD, function(x) which(abs(x) > 2.5))), ]


 #        V1
 #  1:      
 #  2: 29,69
 #  3:    85
 #  4:   100
 #  5: 11,40
 #  6:    89
 #  7:    67
 #  8: 49,68
 #  9:      
 # 10:  7,27

To rather get the total number, for each column in the df, use

lapply(my.df, function(x) sum(abs(x) > 2.5))

# $V1
# [1] 0
# 
# $V2
# [1] 2
# 
# $V3
# [1] 1
# 
# $V4
# [1] 1
# 
# $V5
# [1] 2
# 
# $V6
# [1] 1
# 
# $V7
# [1] 1
# 
# $V8
# [1] 2
# 
# $V9
# [1] 0
# 
# $V10
# [1] 2
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
0

you could also do this:

library(reshape2); library(plyr)
#using data from @Dominic Comtois
my.df <- as.data.frame(matrix(rnorm(n=1000), ncol=10))

data = melt(my.df);
data2 = ddply(data,.(variable),summarise,length(value[(abs(value)>2.5)]))
AdrieanKhisbe
  • 3,899
  • 8
  • 37
  • 45
ck2578
  • 1