I've been provided with a couple of very awkwardly formatted data in excel which I need to reshape so it fits to run a survival analysis in R.
I uploaded an extract of the data to Google drive: https://drive.google.com/open?id=1ret3bCDCYPDALQ16YBloaeopfl2-qVbp
The original data frame has about 2100 observations and 950 variables
Here is the basic data frame:
my.data<-data.frame(
ID=c( "", "","C8477","C5273","C5566"),
LR=c("2012Y","State:FL",5,6,8),
LR=c("2012Y","State:AZ",5,8,10),
LR=c("2011Y","State:FL",7,2,1)
)
my.data
# ID LR LR.1 LR.2
# 1 2012Y 2012Y 2011Y
# 2 State:FL State:AZ State:FL
# 3 C8477 5 5 7
# 4 C5273 6 8 2
# 5 C5566 8 10 1
All the columns have the same name "LR". I don't know if this will be a problem later...
The Year is given in row 1 and the according state the observation happened in row 2.
As output I need to have some panel data that I work with in later survival analysis.
my.data<-data.frame(
ID=c("C8477","C5273","C5566"),
Year=c("2012","2012","2011"),
State=c("FL","AZ","FL"),LR=c(5,8,1)
)
my.data
# ID Year State LR
# 1 C8477 2012 FL 5
# 2 C5273 2012 AZ 8
# 3 C5566 2011 FL 1
I played around with the reshape function and seq functions, but non of that will help me move in the right direction, as the data frame is so oddly arranged.