function to calculate how many observations in a data frame beyond a particular value in R

Question

I have a data frame with numerical columns in R. I want to see how many values in each column of a data frame exceeds some threshold. (e.g standard values more than +-2.5) Here is the output I want to display

Output

What function or what combination of functions could I use to produce similar results, assuming all the columns in my dataframe are numerical?

thanks in advance :)

Please, try to do *some* research on your own. A quick search on Google provided some great examples: [one](http://www.statmethods.net/management/userfunctions.html), [two](http://www.r-bloggers.com/how-to-write-and-debug-an-r-function/), and [three](http://www.ats.ucla.edu/stat/r/library/intro_function.htm). Also, this is a great forum for questions on code you have actually attempted, so please tell us what you *have done so far* and what is not working in it. Last, please read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — r2evans, Mar 25 '15 at 05:44

Dominic Comtois · Accepted Answer · 2015-03-25T06:49:42.987

5

This is quite easily done with lapply:

# Generate sample data (10 columns x 100 rows) normally distributed around 0
my.df <- as.data.frame(matrix(rnorm(n=1000), ncol=10))

# Get the line numbers, for each column in the df
lapply(my.df, function(x) which(abs(x) > 2.5))

# $V1
# integer(0)
# 
# $V2
# [1] 29 69
# 
# $V3
# [1] 85
# 
# $V4
# [1] 100
# 
# $V5
# [1] 11 40
# 
# $V6
# [1] 89
# 
# $V7
# [1] 67
# 
# $V8
# [1] 49 68
# 
# $V9
# integer(0)
# 
# $V10
# [1]  7 27

To get a formatting close to what you have given in your question, ExperimenteR kindly suggested this:

library(data.table)
setDT(my.df)[, list(lapply(.SD, function(x) which(abs(x) > 2.5))), ]


 #        V1
 #  1:      
 #  2: 29,69
 #  3:    85
 #  4:   100
 #  5: 11,40
 #  6:    89
 #  7:    67
 #  8: 49,68
 #  9:      
 # 10:  7,27

To rather get the total number, for each column in the df, use

lapply(my.df, function(x) sum(abs(x) > 2.5))

# $V1
# [1] 0
# 
# $V2
# [1] 2
# 
# $V3
# [1] 1
# 
# $V4
# [1] 1
# 
# $V5
# [1] 2
# 
# $V6
# [1] 1
# 
# $V7
# [1] 1
# 
# $V8
# [1] 2
# 
# $V9
# [1] 0
# 
# $V10
# [1] 2

edited Mar 25 '15 at 06:49

answered Mar 25 '15 at 06:23

Dominic Comtois

10,230
1
39
61

3

Based on your solution, `library(data.table); setDT(my.df)[, list(lapply(.SD, function(x) which(abs(x) > 2.5))), ]` will give a data.table close to the OP's request. – ExperimenteR Mar 25 '15 at 06:45
Neat, I'll add it to my answer if you don't mind. – Dominic Comtois Mar 25 '15 at 06:47
1

Please, do. This is almost same to your answer. – ExperimenteR Mar 25 '15 at 06:49
@ Dominic Thnks. It worked.. :) – Neil Mar 25 '15 at 07:43
Glad it did. You're welcome! – Dominic Comtois Mar 25 '15 at 09:13

score 0 · Answer 2 · edited Mar 25 '15 at 07:56

0

you could also do this:

library(reshape2); library(plyr)
#using data from @Dominic Comtois
my.df <- as.data.frame(matrix(rnorm(n=1000), ncol=10))

data = melt(my.df);
data2 = ddply(data,.(variable),summarise,length(value[(abs(value)>2.5)]))

edited Mar 25 '15 at 07:56

AdrieanKhisbe

3,899
8
37
45

answered Mar 25 '15 at 07:30

ck2578

1

welcome @ck2578, for your next post I suggest you to have a look to http://stackoverflow.com/editing-help :) – AdrieanKhisbe Mar 25 '15 at 07:37

function to calculate how many observations in a data frame beyond a particular value in R

2 Answers2