I'm trying to get an idea about the type of missings in my panel dataset. I think there can be three cases:
- leading NA's; before data starts for a certain individual
- gaps; so missing data for a couple of time periods after which data restarts
- NA's at the end; of the data if an individual stops early
I'm not looking for functions that directly change them or fill them in. Instead, I want to decide what to do with them after I have an idea of the problem.
How to get rid of leading NA's (but not how to see how many you have) is solved here. Addressing all NA's is straightforward:
library(data.table)
Data <- as.data.table(iris)[,.(Species,Petal.Length)]
Data[, time := rep(1951:2000,3)]
Data[c(1:5,60:65,145:150), Petal.Length := NA]
# in Petal lenth setosa has lead NA's, versicolor a gap, virginica NA's at the end
Data[is.na(Petal.Length)] # this is a mix of all three types of NA's
But I want to differentiate the three cases. Ideally, I'd like to address them directly in data.table as
- "give me a data table with all observations that have leading NAs in Petal.Length"
- "give me a data table with observations that are gaps in Petal.Length"
- "give me a data table with observations that are NA's during the last time periods per individual"
For lead NA's I can still get it done but it feels super clumsy:
Data[!is.na(Petal.Length), firstobs := ifelse(min(time) == time, 1, 0), by = Species]
Data[, mintime := max(firstobs * time, na.rm = T), by = Species]
Data[time < mintime]
I guess something similar could be done with max and leads for the last NA's but I cant get my head around gaps and those are the most important ones for me. The solutions I found online usually directly fill in, delete or shift these NA's, I just want to have a look.
Desired output would be:
leading NAs:
Data[1:5]
gaps:
Data[60:65]
NA's at the end:
Data[145:150]
But I'd like to get these by checking where the three types of NA's are as my actual dataset is to large to check this manually.
edit: I should add that in my real dataset, I don't know when every individual starts reporting data. So:
Data[is.na(Petal.Length), time, by= Species]
will not help me.