I have a huge data.frame
with around 200 variables, each represented by a column. Unfortunately, the data is sourced from a poorly formatted data dump (and hence can't be modified) which represents both missing values and zeroes as 0
.
The data has been observed every 5 minutes for a month, and a day-long period of only 0
s can be reasonably thought of as a day where the counter was not functioning, thereby leading to the conclusion that those 0
s are actually NA
s.
I want to find (and remove) columns that have at least 288 consecutive 0
s at any point. Or, more generally, how can we remove columns from a data.frame
containing >=k consecutive 0
s?
I'm relatively new to R, and any help would be greatly appreciated. Thanks!
EDIT: Here is a reproducible example. Considering k=4, I would like to remove columns A and B (but not C, since the 0
s are not consecutive).
df<-data.frame(A=c(4,5,8,2,0,0,0,0,6,3), B=c(3,0,0,0,0,6,8,2,1,0), C=c(4,5,6,0,3,0,2,1,0,0), D=c(1:10))
df
A B C D
1 4 3 4 1
2 5 0 5 2
3 8 0 6 3
4 2 0 0 4
5 0 0 3 5
6 0 6 0 6
7 0 8 2 7
8 0 2 1 8
9 6 1 0 9
10 3 0 0 10