Trying a different approach, I tried this plot: a boxplot for each day showing the distribution of user-message counts, and a line connecting the mean number of messages per user. Here's the target plot:

I start by generating data using the method by @Sacha Epskamp. I generate a large dataset in order to have something for the intended plot
library("ggplot2")
library("lubridate")
# This code from Sacha Eskamp
# http://stackoverflow.com/a/10269840/1290420
# Generate a data set
set.seed(1)
start <- strptime("2012-01-05 00:00:00",
format="%Y-%m-%d %H:%M:%S")
end <- strptime("2012-03-05 00:00:00",
format="%Y-%m-%d %H:%M:%S")
df <- data.frame(message.id = 1:10000,
user.id = sample(1:30,10000,
TRUE,
prob=1:30),
message.date = seq(start,
end,
length=10000)
)
Then I struggle to wrangle the dataframe into a shape suitable for the plot. I am sure that plyr
gurus would be able to vastly improve this.
# Clean up the data frame and add a column
# with combined day-user
df$day <- yday(df$message.date)
df <- df[ df$day!=65, c(2,4) ]
df$day.user <- paste(df$day, df$user.id, sep="-")
# Copy into new data frame with counts for each
# day-user combination
df2 <- aggregate(df,
by=list(df$day,
df$day.user),
FUN="length"
)
df2 <- df2[,c(1,2,3)]
names(df2) <- c("day", "user", "count")
df2$user <- gsub(".+-(.+)", "\\1", df2$user)
Then drawing the plot is the easy part:
p <- ggplot(df2,
aes(x=day,
y=count))
p <- p + geom_boxplot(aes(group=day), colour="grey80")
p <- p + stat_summary(fun.y=mean,
colour="steelblue",
geom="line",
size=1)
p <- p + stat_summary(fun.y=mean,
colour="red",
geom="point",
size=3)
p