Count number of rows in each column in a dataframe that specify a specific condition

Question

New to R btw so I am sorry if it seems like a stupid question. So basically I have a dataframe with 100 rows and 3 different columns of data. I also have a vector with 3 thresholds, one for each column. I was wondering how you would filter out the values of each column that are superior to the value of each threshold.

Edit: Sry for the incomplete question. So essentially what i would like to create is a function (that takes a dataframe and a vector of tresholds as parameters) that applies every treshold to their respective column of the dataframe (so there is one treshhold for every column of the dataframe). The number of elements of each column that “respect” their treshold should later be put in a vector. So for example:

Column 1: values = 1,2,3. Treshold = (only values lower than 3) Column 2: values = 4,5,6. Treshold = (only values lower than 6) Output: A vector (2,2) since there are two elements in each column that are under their respective tresholds.

Thank you everyone for the help!!

Can you provide the data and vector of thresholds using `dput`? And are you saying that you want to retain rows where all three values are above the three thresholds, erase values that fall below the threshold while retaining the rows, or something else? — jdobres, Jan 31 '22 at 22:37
Welcome to Stack Overflow. It would help to [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including code and example data in a plain text format - for example the output from `dput(yourdata)`. An indication of the desired output would help too. For example, do you want to keep only rows where each of the 3 columns is > threshold ? — neilfws, Jan 31 '22 at 22:37

score 0 · Accepted Answer · answered Feb 01 '22 at 00:00

Your example data:

df <- data.frame(a = 1:3, b = 4:6)
threshold <- c(3, 6)

One option to resolve your question is to use sapply(), which applies a function over a list or vector. In this case, I create a vector for the columns in df with 1:ncol(df). Inside the function, you can count the number of values less than a given threshold by summing the number of TRUE cases:

col_num <- 1:ncol(df)
sapply(col_num, function(x) {sum(df[, x] < threshold[x])})

Or, in a single line:

sapply(1:ncol(df), function(x) {sum(df[, x] < threshold[x])})

You might want to edit the title of your question, as what you want to achieve is to count the number of rows in each column that satisfy a particular condition, not to filter the data frame with a combination of different thresholds for each column. — Javier Herrero, Feb 01 '22 at 00:05

Count number of rows in each column in a dataframe that specify a specific condition

1 Answers1