I have a situation where I need to reformat a dataframe where each row contains an ID value, three years (stored in columns), and three predictions (stored in columns). If the year or prediction is not used then it contains a dummy value (-9999).
Here is an example dataframe:
# Example initial dataframe
sample = data.frame(
id = c(1,2,3),
year1 = c(2005, 2005, 2005),
year2 = c(2009, 2010, 2010),
year3 = c(2010, -9999, -9999),
pred1 = c(0, 0, 0),
pred2 = c(0, 1, 0),
pred3 = c(1, -9999, -9999)
)
My goal is to efficiently reformat the data frame into one where each row contains the ID, the current year, and the prediction (1/0):
The following code demonstrates the desired output dataframe:
# Code produces desired output format
desired_format = data.frame(
id = rep(1:3, each=6),
year = rep(seq(2005, 2010, 1), 3),
pred = c(c(0,0,0,0,0,1), c(0,1,1,1,1,1), c(0,0,0,0,0,0))
)
Here is a screenshot of the desired format from my R console.
I've come with with a rather elaborate way of doing this with for-loops. However, I've been unable to come up with dplyr or other data management library -based solution to make reformatting more efficient and more maintainable for anyone looking any my code in the future.