3

I'm hoping you can help me with creating a variable that will count a "run" since a last event of another variable, using the R programming language. The data set with which I'm working is country-year panel data, and is unbalanced.

I'll illustrate what I'd like to do below. COUNTRY and YEAR are the cross-section identification and time unit respectively. COUNTRYYEAR is a concatenation of both variables, there to create an index for each unique observation.

Let EVENT be a binary indicator, marking whether an event of interest is present (EVENT = 1) or not (EVENT = 0). Let COUNTZERO be a discrete count variable, marking the time (here: years) since the last observed 1 on the EVENT variable. Let COUNTONE be another discrete count variable, marking a running count of consecutive ones of the EVENT variable. I'd like to have a data frame that looks like this:

COUNTRYYEAR COUNTRY YEAR EVENT COUNTZERO COUNTONE
10011950       1    1950  1       0         1
10011951       1    1951  1       0         2
10011952       1    1952  0       1         0 
10011953       1    1953  0       2         0 
10011954       1    1954  0       3         0 
10011955       1    1955  0       4         0 
10011956       1    1956  0       5         0

....

10021950       2    1950  1       0         1
10021951       2    1951  0       1         0
10021952       2    1952  1       0         1
10021953       2    1953  0       1         0
10021954       2    1954  0       2         0
10021955       2    1955  0       3         0
10021956       2    1956  0       4         0

....

10031975       3    1975  1       0         1
10031976       3    1976  1       0         2
10031977       3    1977  1       0         3
10031978       3    1978  1       0         4
10031979       3    1979  0       1         0
10031980       3    1980  0       2         0

....

The data go on. The panel data is unbalanced. Some countries are observed at the beginning (in my illustration: 1950) and others don't. Some countries drop out before the right hand end of the temporal domain and others don't. Some countries have all zeroes on the event and some have all 1s.

How can I go about creating those running count variables from the current EVENT variable I have? I looked at this solution, but, after running the example, it didn't quite create the vector I want to create.

Any input would be greatly appreciated.

Reproducible code of this illustration follows.

country <- c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3) 
year <- c(1950, 1951, 1952, 1953, 1954, 1955, 1956, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1975, 1976, 1977, 1978, 1979) 
event <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0) 
Data=data.frame(country=country, year=year, event=event)
Community
  • 1
  • 1
steve
  • 593
  • 6
  • 22
  • 1
    Would you mind providing a reproducible data set? http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Dason May 10 '13 at 18:05
  • `country <- c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3)` `year <- c(1950, 1951, 1952, 1953, 1954, 1955, 1956, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1975, 1976, 1977, 1978, 1979)` `event <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0)` `Data=data.frame(country=country, year=year, event=event)` – steve May 10 '13 at 18:19
  • 1
    If you read that post I linked it explains how you can easily make an example – Dason May 10 '13 at 18:20

3 Answers3

5

You could use a combination of rle and seq

reps <- c(10, 9, 3)
offsets <- unlist(sapply(reps, seq))
dat <- data.frame(country = rep(1:3, reps), year = 1950 + offsets, event = rbinom(sum(reps), 1, .5))

o <- rle(dat$event)
sequence <- unlist(sapply(o$lengths, seq))
dat$countzero <- sequence
dat$countzero[dat$event != 0] <- 0
dat$countone <- sequence
dat$countone[dat$event != 1] <- 0

which gives

> dat
   country year event countzero countone
1        1 1951     0         1        0
2        1 1952     0         2        0
3        1 1953     0         3        0
4        1 1954     0         4        0
5        1 1955     1         0        1
6        1 1956     0         1        0
7        1 1957     0         2        0
8        1 1958     1         0        1
9        1 1959     0         1        0
10       1 1960     1         0        1
11       2 1951     0         1        0
12       2 1952     1         0        1
13       2 1953     1         0        2
14       2 1954     1         0        3
15       2 1955     1         0        4
16       2 1956     0         1        0
17       2 1957     0         2        0
18       2 1958     0         3        0
19       2 1959     1         0        1
20       3 1951     0         1        0
21       3 1952     0         2        0
22       3 1953     0         3        0
Dason
  • 60,663
  • 9
  • 131
  • 148
2

Here's a data.table solution with sequence and rle:

require(data.table)
DT <- data.table(Data)
DT[, c("count_zero", "count_one") := {
rr <- sequence(rle(!event)$lengths)
list(rr * !event, rr * event)}]
#     country year event count_zero count_one
#  1:       1 1950     1          0         1
#  2:       1 1951     1          0         2
#  3:       1 1952     0          1         0
#  4:       1 1953     0          2         0
#  5:       1 1954     0          3         0
#  6:       1 1955     0          4         0
#  7:       1 1956     0          5         0
#  8:       2 1950     1          0         1
#  9:       2 1951     0          1         0
# 10:       2 1952     1          0         1
# 11:       2 1953     0          1         0
# 12:       2 1954     0          2         0
# 13:       2 1955     0          3         0
# 14:       2 1956     0          4         0
# 15:       2 1957     0          5         0
# 16:       2 1958     0          6         0
# 17:       3 1975     1          0         1
# 18:       3 1976     1          0         2
# 19:       3 1977     1          0         3
# 20:       3 1978     1          0         4
# 21:       3 1979     0          1         0
#     country year event count_zero count_one
Arun
  • 116,683
  • 26
  • 284
  • 387
  • Thank you very much, and thanks to Dason as well for the input. I think this is what I want to do. – steve May 10 '13 at 18:36
  • I realised that `by=country` is not necessary and have made the edit. You may want to check the answer again. – Arun May 10 '13 at 18:47
  • It looks like it came out the same. Thanks for offering a solution that works within the confines of having to deal with an existing data frame. – steve May 10 '13 at 18:57
0

you can use this:

count_since<-function(trigger)
{
  i <- seq_along(trigger)
  (i - cummax(i*trigger))*cummax(trigger)
}

count_sinve(event) and count_since(!event) are the calls one would use in your example

count_since(1:100%%5==0)
  [1] 0 0 0 0 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1
 [72] 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0

Nick Nassuphis
  • 257
  • 2
  • 6