removing columns where all entries are 0 in r

Question

I have a large binary dataset of 38 rows and 4063 columns (first column is an ID list). I'm trying to get rid of the columns where all entries are 0 by running new.df <- df[, -(which(colSums(df[,-1]) == 0))] but for some reason none of the columns with all 0's are getting removed but other columns are getting removed instead. Running length(colSums(df[,-1]) == 0) gives me 1691, and after running my code the new matrix is 38 by 1692?? I've also tried new.df <- subset(df,select = colSums(df[,-1])!=0) and get the same result. Can someone help me with what the problem might be, or what other code would be better to use?

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Feb 03 '20 at 21:37
You're introducing an off-by-one error. The first column of `df[, -1]` is the second column of `df`. You're also comparing the column sums to 0, but you should be taking the `colSum` of `df == 0`. You could probably try `new.df <- df[, -(1 + (which(colSums(df[,-1] == 0))))]` — Gregor Thomas, Feb 03 '20 at 21:37
Easier, actually, to count the non-zeros, and keep all columns with a non-zero count of non-zero values (and the first column): `new.df <- df[, c(1, 1 + which(colSums(df[ -1] != 0)))]` — Gregor Thomas, Feb 03 '20 at 21:44
@Gregor-reinstateMonica thanks! the 1+ worked but with the original ```colSum``` format ```new.df <- df[, -(1 + (which(colSums(df[,-1]) == 0)))]``` — rholeepoly, Feb 03 '20 at 21:45
Yeah, testing `colSums() == 0` will work if you don't have any negative values, but it's not a good general solution. — Gregor Thomas, Feb 03 '20 at 21:48

removing columns where all entries are 0 in r

0 Answers0