How do you drop all rows from a dataframe where the sum of a range of columns is 0?

Question

I have a dataframe with the columns
experimentResultDataColumns - faceGenderClk - 35 more columns ending with Clk - rougeClk - someMoreExperimentDataColumns
I am trying to drop all rows from the dataframe, where the sum of the 50 colums from faceGenderClk to (including) rougeClk is 0

There is data of an online study in the dataframe and the "Clk" columns count how many times the participant clicked a specific slider. If no sliders were clicked, the data is invalid. (It's basically like someone handing you your survey without setting their pen on the paper)

I was able to perform similar logic with a statement like this:
df<-df[!(df$screenWidth < 1280),]
to cut out all insufficiently sized screens, but I am unsure of how to perform this sum operation within that statement. I tried
df <- df[!(sum(df$faceGenderClk:df$rougeClk) > 0)]
but that doesn't work. (I'm not very good at R, I assume it definitely shouldn't work with that syntax)

The expected result is a dataframe which has all rows stripped from it, where the sum of all 50 values in that row from faceGenderClk to rougeClk is 0

EDIT:
data: https://pastebin.com/SLAmkHk5
the expected result of the code would drop the second row of data

code so far:

df <- read.csv("./trials.csv")
SECONDS_IN_AN_HOUR <- 60*60
MILLISECONDS_IN_AN_HOUR <- SECONDS_IN_AN_HOUR * 1000
library(dplyr)
#levels(df$latinSquare) <- c("AlexaF", "SiriF", "CortanaF", "SiriM", "GoogleF", "RobotM") ignore this since I faked the dataset to protect participants' personal data
df<-df[!(df$timeMainSessionTime > 6 * MILLISECONDS_IN_AN_HOUR),]
df<-df[!(df$screenWidth < 1280),]

the as of this edit accepted answer solves the problem with:

cols = grep(pattern = "Clk$", names(df), value=TRUE)
sums = rowSums(df[cols])
df <- df[sums != 0, ]

Please make this question *reproducible*. This includes sample code attempted (including listing non-base R packages), sample *unambiguous* data (e.g., `dput(head(x))` or `data.frame(x=...,y=...)`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. — r2evans, Aug 08 '19 at 16:54
@r2evans I edited my post and provided fake data as well as my code up to that point in the program — c4edus, Aug 08 '19 at 17:16
In the future, please add sample data directly to the question. It's nice to have sample data in the question to keep things all in one place. It's also nice to make a *minimal* sample of data - for something like this 3 rows and 5 columns would be plenty big enough to demonstrate the problem and solution. Your question was exceptionally clear - which helped me answer without sample data - but in general, a minimal sample in the question is best. — Gregor Thomas, Aug 08 '19 at 20:47

score 2 · Accepted Answer · answered Aug 08 '19 at 16:56

2

First, get the names of the column you want to check. Then add up the columns and do your subset.

# columns that end in Clk
cols = grep(pattern = "Clk$", names(df), value = TRUE)

# add them up
sums = rowSums(df[cols])

# susbet
df[sums != 0, ]

answered Aug 08 '19 at 16:56

Gregor Thomas

136,190
20
167
294

How do you drop all rows from a dataframe where the sum of a range of columns is 0?

1 Answers1