0

I have a question regarding the imputation of missing values:

For each individual my dataset has more than one row. After the beginning of data selection, we introduced a new variable. Therefore, some individuals have missing values at the beginning of the observation. Now I have the task to replace all missing values by the first non-missing value for each individual.

For example:

set.seed(123)
d <- data.frame(
  id = rep(1:3, each = 10),
  year = rep(seq(2000,2002),10))

#Introduce NA values in first rows
d[,2][1:3] <- NA
d[,2][11:14] <- NA
d[,2][20:27] <- NA

For each individual we have more than one observation. Individual 1 has 3 missing values, the 4th value is equal to 2000. Therfore, all missing values from individual 1 have to be relacped by 2000. For individual 2 all missing values have to be replaced by the 5th observation (2002) and so on.

Of cause our dataset is very large with about 10,000 observations and 2000 individuals. I can't do it by hand. Any smart solutions for this problem?

Thank you! :)

unicorn
  • 41
  • 4

0 Answers0