I would like to plot a life lines diagram for my data so that the readers can understand how the data is shaped and what right censoring does to the data.
Ideally I would like it to look something like [this][1]
I need a horizontal line for each participant, starting from the date of observation and ending on the last day we observed him. The people having their last day of observation should be in a different color (or have another indicator).
The data would look like this:
regdate lastlogindate censor duration
2010-02-24 02:30:43 2010-05-27 07:58:17 0 92
2007-12-23 11:16:37 2008-03-07 10:36:29 1 75
2009-01-19 04:23:28 2009-01-24 06:33:38 1 5
2010-07-25 10:24:39 2010-08-11 07:13:25 0 17
2009-08-23 07:18:06 2009-08-24 06:25:35 1 1
2007-08-12 07:24:55 2010-06-01 06:53:57 0 1024
UCLA has how its done in Stata. I told my advisor I can match whatever he did in Stata in R. I am in need of some help here guys :)
EDIT: I finally managed to get it right.
Here is a sample of the data with dput.
structure(list(users_id = c(1747516, 913136, 921278, 1654913,
782364, 1371798, 1174461, 1493894, 1124186, 1249310),
regdate = c("2010-08-15 05:50:09", "2009-01-04 13:47:46", "2009-01-07 13:34:53", "2010-06-30 11:19:08", "2008-08-13 06:46:28", "2010-01-26 12:58:20", "2009-08-18 15:13:12", "2010-04-04 11:33:47", "2009-07-10 12:33:41", "2009-10-19 13:30:49" ),
lastlogindate = c("2010-09-01 05:51:34", "2010-09-17 05:25:00", "2009-05-15 07:55:30", "2010-07-02 07:34:02", "2008-10-25 14:29:50", "2010-03-17 05:04:58", "2010-07-06 03:48:48", "2010-04-09 19:44:42", "2010-09-03 04:18:18", "2009-10-20 06:26:55"),
censor6 = c(0, 0, 1, 0, 1, 1, 0, 0, 0, 1)),
.Names = c("users_id", "regdate", "lastlogindate", "censor6"),
row.names = c(1L, 2L, 4L, 5L, 7L, 9L, 10L, 11L, 12L, 14L),
class = "data.frame")
What I did was I melted the data with reshape2 package so that for each observation there were two rows. Start and end dates. Then I added the censoring variable with merge.
# Create a subset of the data with 25 observations
sampData1<-data[c("users_id", "regdate", "lastlogindate")]
sampData1<-sampData1[sample(1:nrow(sampData1),25),]
# Create two entries for each observation 1 for start date 1 for end
sampData1<-melt(sampData1, id.vars="users_id")
sampData1<-sampData1[order(sampData1$users_id, sampData1$value),]
# Add a grouping variable basically the same thing as user ID but looks better on plot
sampData1$ID<-rep(seq(1,nrow(sampData1)/2,1), each=2)
# Put back the censoring variable
sampData1<-merge(sampData1, data[,c("users_id", "censor6")])
sampData1$censor6<-as.factor(sampData1$censor6)
sampData1$value<-as.POSIXct(sampData1$value, origin="1970-01-01 00:00:00")
Now Let us create a plot
# Base Plot
gp<-ggplot(sampData1)
# Add the horizontal lines (This is the big deal)
gp+geom_line(aes(value, ID, group=ID, color=censor6, size=1))
# Decluter the x axis labels
gp+scale_x_datetime(breaks=date_breaks('3 month'))
# rotate x axis labels
gp+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Change the legend label and colors
gp+scale_color_manual(values = c("red", "blue"))