Ignore second or more consecutive 0

Question

I am trying to solve the very basic example and trying to extract following data:

count   SN  data.stamp 
1   00601   2018-07-26 13:38:39       
0   00601   2018-11-05 23:00:09       
0   00601   2018-11-05 23:00:16        
4   00601   2018-11-12 23:00:05        
0   00601   2018-12-12 23:00:05        
5   00601   2018-11-12 23:00:05        
0   00601   2018-12-12 23:00:05
0   00601   2018-11-12 23:00:05        
0   00601   2018-12-12 23:00:05

Expected output:

count   SN  data.stamp 
1   00601   2018-07-26 13:38:39       
0   00601   2018-11-05 23:00:09       
4   00601   2018-11-12 23:00:05        
0   00601   2018-12-12 23:00:05        
5   00601   2018-11-12 23:00:05        
0   00601   2018-12-12 23:00:05

I would like to consider only single count with 0 value. If there are multiple count of 0 values then it should consider only first value and ignore rest of 0 counts.

Basically, I am looking for only first zero value followed by non zero value.

I tried using rle but I would like to extract data from the data.frame. rle can give me information about the values and lengths. I can write for loop to check but looking for a quick and short way.

@akrun: Pattern is `count`. – Saurabh Chauhan Jan 11 '19 at 09:32 — Saurabh Chauhan, Jan 11 '19 at 09:32

Cath · Answer 1 · 2019-01-11T09:45:49.877

5

In base R, you can subset your data.frame to get only the rows for which count is different from 0 or count is 0 but the previous row was different from zero:

df[df$count!=0 | (df$count==0 & c(TRUE, head(df$count, -1)!=0)), ]
# (or: subset(df, count!=0 | (count==0 & c(TRUE, head(count, -1)!=0))))

#  count  SN          data.stamp
#1     1 601 2018-07-26 13:38:39
#2     0 601 2018-11-05 23:00:09
#4     4 601 2018-11-12 23:00:05
#5     0 601 2018-12-12 23:00:05
#6     5 601 2018-11-12 23:00:05
#7     0 601 2018-12-12 23:00:05

edited Jan 11 '19 at 09:45

answered Jan 11 '19 at 09:42

Cath

23,906
5
52
86

@SaurabhChauhan your current problem is a logical problem which does not need either external packages or more complex functions ;-) – Cath Jan 11 '19 at 09:46
Thanks for the perfect answer. Both answers are fine and but I have to accept only (limitation from stackoverflow :) ) otherwise I will accept both as an answer (from you and akrun). Thanks for your time and effort. Finally, accepting the answer from akrun as he was quick. – Saurabh Chauhan Jan 11 '19 at 09:58

akrun · Accepted Answer · 2019-01-11T09:53:14.677

We can use rleid from data.table to create a logical vector for filtering the rows

library(dplyr)
df1 %>%
   filter(!duplicated(cbind(data.table::rleid(count), SN)))

To be more precise, rleid can be applied on a logical vector

df1 %>% 
  filter(!duplicated(cbind(rleid(count== 0), SN)))

The rleid checks adjacent elements for similarity and when there is an inequality it increases the id created by 1. i.e.

v1 <- c(1, 0, 0, 5, 4, 5, 5)
rleid(v1)
#[1] 1 2 2 3 4 5 5

Now, all duplicate elements that are adjacent are given the same ID. If we are specific in recognizing '0's as duplicates

rleid(v1 == 0)
#[1] 1 2 2 3 3 3 3

Here, there are only two values i.e. TRUE/FALSE

v1 == 0
#[1] FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE

Wrapping with duplicated returns a logical index on the index

If we want a base R solution, this can be done with rle. Create the sequence with replicating the values with the lengths and get the logical vector by wrapping with duplicated as before

i1 <- with(rle(!df1$count), rep(seq_along(values), lengths))
i2 <- !duplicated(cbind(i1, df1$SN))
df1[i2, ]
#  count  SN          data.stamp
#1     1 601 2018-07-26 13:38:39
#2     0 601 2018-11-05 23:00:09
#4     4 601 2018-11-12 23:00:05
#5     0 601 2018-12-12 23:00:05
#6     5 601 2018-11-12 23:00:05
#7     0 601 2018-12-12 23:00:05

data

df1 <- structure(list(count = c(1L, 0L, 0L, 4L, 0L, 5L, 0L, 0L, 0L), 
    SN = c(601L, 601L, 601L, 601L, 601L, 601L, 601L, 601L, 601L
    ), data.stamp = c("2018-07-26 13:38:39", "2018-11-05 23:00:09", 
    "2018-11-05 23:00:16", "2018-11-12 23:00:05", "2018-12-12 23:00:05", 
    "2018-11-12 23:00:05", "2018-12-12 23:00:05", "2018-11-12 23:00:05", 
    "2018-12-12 23:00:05")), class = "data.frame", row.names = c(NA, 
-9L))

Thanks akrun for the perfect answer. Could you please explain in detail if possible. — Saurabh Chauhan, Jan 11 '19 at 09:39
Thanks for the perfect answer. Both answers are fine and but I have to accept only (limitation from StackOverflow :) ) otherwise I will accept both as an answer (from you and Cath). Thanks for your time and effort. Finally, accepting the answer from akrun as he was quick. — Saurabh Chauhan, Jan 11 '19 at 09:58

Ignore second or more consecutive 0

2 Answers2

data