2

I have a data set of individuals with differing numbers of repeated observations and a value that is sometimes only filled in the final observation, say:

id <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
order <- c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
value <- c(NA, NA, NA, 3, NA, NA, NA, 6, NA, NA, NA, 1)
x <- data.frame(id, order, value)

Where ID represents each individual, order is the order of observations taken (1 is first, increasing by 1 with subsequent observations) and value is some value that needs backfilling (i.e. I need the NA's for each ID to be filled in a value if it is missing).

I am stuck on how to do this. I tried ordering the data first:

x <- x[order(x$id, -x$order, x$value),]

But I could not fathom how to get the code to work by selecting the previous observation that way (i.e. if id = previous id & value is missing, take previous version of value). The data is large (13m records) and there are many different numbers of order there (most have 1 observation, some may have up to 10). What would be the best way to do this?

1 Answers1

1

You could first select only the rows that contain a value:

x2 <- subset(x, x$value != "NA")

And, then merge the two data frames to fill in the empty slots with the value you obtained for each ID in x2, see value.y column:

merge(x, x2, by="id")

#   id order.x value.x order.y value.y
#1   1       1      NA       4       3
#2   1       2      NA       4       3
#3   1       3      NA       4       3
#4   1       4       3       4       3
#5   2       1      NA       4       6
#6   2       2      NA       4       6
#7   2       3      NA       4       6
#8   2       4       6       4       6
#9   3       1      NA       4       1
#10  3       2      NA       4       1
#11  3       3      NA       4       1
#12  3       4       1       4       1
VLC
  • 1,119
  • 1
  • 12
  • 14