Is it possible to create a binary variable based on the first appearance of another (date) variable?
For my thesis I am trying to create a variable that captures the number of first-time forecasts issued and revised during the month divided by the number of forecasts at the month-end for a firm in a given year. For convenience I would like to separate the first-time forecasts issued and revised in different columns.
Example data
dt <- data.table(
analyst = rep((1:2),10),
id = rep((1:5),4),
year = rep(as.Date(c('2009-12-31','2009-12-31','2010-12-31','2010-12-31'),format='%Y-%m-%d'),5),
fdate = rep(as.Date(c('2009-07-31','2009-02-26','2010-01-31','2010-05-15','2009-06-30','2009-10-08','2010-07-31','2010-11-30','2009-01-31','2009-06-26','2010-05-03','2010-04-13','2009-10-30','2009-11-02','2010-03-28','2010-10-14','2009-02-17','2009-09-14','2010-08-02','2010-10-03'),format='%Y-%m-%d')))
To create the variable, I used the following steps: First, identifying the issuance of the first-time forecasts for a given year (for firms by analysts) with the following code:
dt2 <- setkey(setDT(dt), id, year, analyst)[order(fdate),.SD[1L] ,by=list(id,year)]
However, this generates a table with only the first-time forecast by id, year and analyst. Secondly, I give the first-time forecasts the value 1 with:
dt3 <- print(dt2[, first:=1L])
Third, combine the two data.tables:
dt4 <- dt3[dt, on = c('id', 'year', 'analyst', 'fdate')]
Fourth, I replace the na for 0
dt4[is.na(dt4)] <- 0
Fifth, creating the revised binary variable:
dt4$rev <- ifelse(dt4$first == 0,"1", "0")
Last, I sum the number of first-time and revised forecasts for every month for a firm.
Is there a more elegant way of creating this variable so I can learn more of R/data.table? I have tried to incorporate the dcast function, based on the answers from:
R data.table - categorical values in one column to binary values in multiple columns
How to programmatically create binary columns based on a categorical variable in data.table?
However, it doesn't work out for me.
Current result, based on the previous mentioned steps:
id year analyst fdate first rev
1 2009-12-31 1 2009-07-31 1 0
1 2009-12-31 2 2009-10-08 0 1
1 2010-12-31 1 2010-05-03 1 0
1 2010-12-31 2 2010-10-14 0 1
2 2009-12-31 1 2009-02-17 1 0
2 2009-12-31 2 2009-02-26 0 1
2 2010-12-31 1 2010-07-31 0 1
2 2010-12-31 2 2010-04-13 1 0
3 2009-12-31 1 2009-10-30 0 1
3 2009-12-31 2 2009-09-14 1 0
3 2010-12-31 1 2010-01-31 1 0
3 2010-12-31 2 2010-11-30 0 1
4 2009-12-31 1 2009-01-31 1 0
4 2009-12-31 2 2009-11-02 0 1
4 2010-12-31 1 2010-08-02 0 1
4 2010-12-31 2 2010-05-15 1 0
5 2009-12-31 1 2009-06-30 0 1
5 2009-12-31 2 2009-06-26 1 0
5 2010-12-31 1 2010-03-28 1 0
5 2010-12-31 2 2010-10-03 0 1