There are certain circumstances where we need to rotate the data from a wide format to a long format. I had a similar issue like this in some of my code but in reverse. Here is the question I posted. This also provides other options that can be used to solve this problem.
R- How to put multiple cases (rows) in one row
I would use reshape. Here is a link to the r documentation.
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/reshape
Here is the data.
policyno <- c (1001, 1002, 1003)
amount <- c (100, 200, 300)
info1 <- c (500001, 600001, 700001)
info2 <- c (500002, 600002, 700002)
info3 <- c (500003, 999999, 700003)
info4 <- c (999999, 999999, 700004)
info5 <- c (999999, 999999, 700005)
data <- data.frame (policyno, amount, info1, info2, info3, info4, info5)
data
There are a few things we need to input for reshape to work. We need the name of the data frame (data), the id variable (policyno). We also need the name of the new variable we want to create (info) and a list of the variables which make up this new variable (info1, info2, info3, info4, info5). For generating this list of variables, there are a few ways to do this. I have seen posts where we list the row number. If we have a very small dataset and the variables (and their row numbers stay the same), then this method is fine. If you are working with a very large dataset where the row numbers might change depending upon how you are subsetting the data, then we can have R identify the row numbers for us. Which and calnames can be used for this purpose.
data2 <- reshape (data,
idvar = "policyno",
v.names = c ("info"),
varying = list (c (which(colnames(data) == "info1"),
which (colnames(data) == "info2"),
which(colnames(data) == "info3"),
which(colnames(data) == "info4"),
which(colnames(data) == "info5")
)
),
direction = "long")
data2
This rotates the data but doesn't remove the 999999 cases. To remove the 999999, we can use subset. As I minor note, this reshape created a variable labeled time which we can remove using select.
data3 <- subset (data2,
select = -c (time),
info != 999999)
data3
Again, this is just one way that can be used to solve the problem. Stackoverflow has a lot of posts about reshape and transpose that are very useful.