Compare each row value to multiple rows and return a value

Question

I am trying to the following in R.

I have a data frame with time(hour) and the second column has either zero or one. Time interval between consecutive time step is 1 hour. It would be easier if I could attach a sample file but don't know how to attach one. I am trying to find out how many 1's occur which are 24-hour apart. More than one "1" in a 24 hour period is considered a "1"

Let's assume a counter, cnt is initialized at 0.

I want to compare each of row in the second column to the second column values in 24 hour window. If there is more than one "1" in any 24-hour period, it implies that there is one "1" in that 24-hour window.

In FORTRAN, I would set up a counter, go to each time step, compare the value for the next 24 hours/time steps. If a "1" is found, for the first "1" would increase the counter by 1. If there is another "1" in the same 24-hour period, I would not increase the counter any more and move to the next row and keep doing so until the end of the file.

Hopefully, I could explain what I want. If it's not clear, let me know. Think something can be done by the match() function or the plyr package but cannot find out.

You don't need to 'attach' a sample file, but it would be helpful to provide some code generating some sample data so that we can see exactly what your actual data looks like. — msoftrain, May 29 '14 at 17:49
If I get it right, you want to find out which of your overlapping 24-hours have at least one "1"; i.e. the output could be a simple T/F vector? — alexis_laz, May 29 '14 at 18:01
set.seed(1) dat <- data.frame(exceed = sample(c("par1","par2"), 4000, replace =TRUE)) dummy <- model.matrix(~ exceed -1, data=dat) assume row number as time (hour) and first column as the row values I want to compare. Really appreciate any help on this. — user2653586, May 29 '14 at 18:17
@alexis_laz: Not sure. I was trying to do a 24-hour moving sum of the numbers, after which I get at least one when the moving sum is greater than equal to one, but how do I distinguish if the occurances are 24 hour apart from each other? — user2653586, May 29 '14 at 18:24
I'm sorry - I'm still confused. Could you provide a small "x" (of 0's and 1's) taking into account "weird" cases of occurences and show the expected output, assuming a -e.g.- 3-hour interval (not 24) for convenience? — alexis_laz, May 29 '14 at 18:33
I think I solved it using some help from here: http://stackoverflow.com/questions/15466880/cumulative-sum-until-maximum-reached-then-repeat-from-zero-in-the-next-row — user2653586, May 29 '14 at 19:26
I added a counter in order to count the number of times "1" appears in the 24-hour window. — user2653586, May 29 '14 at 19:27
It's unclear if you mean you want to count the number of intervals between 1s where that interval is 24H or greater, or you want to count the number of 24 hour blocks aligned to wall clock that contain at least one "1", or the unaligned case of the same. Please clarify. — Alex Brown, May 29 '14 at 21:32
There are two conditions here: (1) the "1"s needs to be 24 hour apart and (2) if two or more "1" occurs in a 24 hour window, it needs to be assumed that there is actually one "1" in that window. I created a data frame with the time difference between consecutive rows when "1" occurs, then calculated the cumulative sum of the time difference with the condition that if the cumulative sum exceeds 24, the cumulative sum resets to zero. This worked for me. — user2653586, May 30 '14 at 14:07

score 1 · Answer 1 · answered May 29 '14 at 21:29

The IRanges package in Bioconductor can help here. First, calculate the running sum of score for windows of size 24. Then cap the sums at 1.

rs <- IRanges::runsum(score, 24)
pmin(rs, 1)

Is that what you want? You might want to play with the endrule parameter.

Compare each row value to multiple rows and return a value

1 Answers1