I have a large binary dataset of 38 rows and 4063 columns (first column is an ID list). I'm trying to get rid of the columns where all entries are 0 by running new.df <- df[, -(which(colSums(df[,-1]) == 0))]
but for some reason none of the columns with all 0's are getting removed but other columns are getting removed instead. Running length(colSums(df[,-1]) == 0)
gives me 1691
, and after running my code the new matrix is 38 by 1692?? I've also tried new.df <- subset(df,select = colSums(df[,-1])!=0)
and get the same result. Can someone help me with what the problem might be, or what other code would be better to use?
Asked
Active
Viewed 54 times
0

rholeepoly
- 43
- 3
-
1It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Feb 03 '20 at 21:37
-
1You're introducing an off-by-one error. The first column of `df[, -1]` is the second column of `df`. You're also comparing the column sums to 0, but you should be taking the `colSum` of `df == 0`. You could probably try `new.df <- df[, -(1 + (which(colSums(df[,-1] == 0))))]` – Gregor Thomas Feb 03 '20 at 21:37
-
Easier, actually, to count the non-zeros, and keep all columns with a non-zero count of non-zero values (and the first column): `new.df <- df[, c(1, 1 + which(colSums(df[ -1] != 0)))]` – Gregor Thomas Feb 03 '20 at 21:44
-
@Gregor-reinstateMonica thanks! the 1+ worked but with the original ```colSum``` format ```new.df <- df[, -(1 + (which(colSums(df[,-1]) == 0)))]``` – rholeepoly Feb 03 '20 at 21:45
-
Yeah, testing `colSums() == 0` will work if you don't have any negative values, but it's not a good general solution. – Gregor Thomas Feb 03 '20 at 21:48