Much of the following problem arises from the sheer size of the dataframe (198240 observations). I'll try to break it down as best as I can.
The Goal
I want to create a variable DURATION which is how long a house was sick.
The Known
- Household ID and Week (There are 1120 houses and 177 weeks)
- HDINC (Currently Sick variable )
- HDINC_1 (Sick Week Prior variable )
The Problem I don't understand how to get the function/loop to be traversing the dataframe in both household and time concurrently.
I know it will be a function or loop that goes something like the following (Not in R-code, but in logic)
IF (hdinc > 0) #a house on a certain date is sick
{ Duration = 1 AND look at hdinc_1
IF (hdinc_1 = 0 )
{ Duration = Duration + 0
AND Go onto the next date for that house.
IF hdinc_1 >0 then #if the house was sick last week
{ Duration = Duration + 1
Go to SameHouse, Week-1 and look at hdinc_1 to see if it was sick the week prior
I am having trouble with the following:
- Getting it to start on a particular observation based on household/date
- Moving the function backwards or forwards while maintaining the household
- Eventually getting the function to restart using a different household
I know this is really convoluted but I can't even get the loop to start to provide y'all sample code.
Sample Data:
dat <- structure(list(id_casa = c(802L, 802L, 802L, 802L, 802L, 802L, 802L, 955L, 955L, 955L, 955L), survdate = structure(c(3L, 10L, 5L, 1L, 2L, 4L, 11L, 6L, 7L, 8L, 9L), .Label = c("1/11/2006", "1/18/2006", "1/19/2005", "1/25/2006", "1/4/2006", "10/13/2004", "10/20/2004", "10/27/2004", "11/3/2004", "12/28/2005", "2/1/2006" ), class = "factor"), hdinc = c(125, 142.85715, 0, 0, 0, 142.85715, 0, 50, 32, 159, 2.5), hdinc_1 = c(0, 125, 142.85715, 0, 0, 0, 142.85715, 0, 50, 32, 159)), .Names = c("id_casa", "survdate", "hdinc", "hdinc_1"), class = "data.frame", row.names = c(NA, -11L))
Sample Output: