0

I have such a data frame df1, to indicate the emotion status of a certain user in a time series:

0.00
0.10
0.20
0.00
0.70 
....

And another data frame df2, to indicate the number of record in df1 in a certain day:

2015-01-02   1
2015-01-03   2
2015-01-04   3

i.e, the first value belongs to 01-02, the second and the third value belongs to 01-03 and so on.

Now I'd like the plot a point graph with date as x-axis and emotional value as y-axis. How can I do that? Furthermore, How to skip all the 0.0 value and just show the value other than zero? Thanks!

1 Answers1

0

There are two separate steps in the solution I propose. First create a vector dates with the sequence of dates used, secondly use dplyr::filter to remove the zero values.

For the first step you can use a combination of lapply and rep as follows:

dates <- c(lapply(seq(nrow(df2)), function(idx) rep(df2$date[idx], df2$count[idx])), recursive = TRUE)

the c(..., recursive = TRUE) is around it to convert the result to a flat vector.

Then combine into a data.frame filter and plot as follows:

df3 <- cbind(df1, data.frame(dates = as.POSIXct(dates)))
ggplot(df3 %>% filter(state != 0.0), aes(x=dates, y = state)) + geom_point()
kasterma
  • 4,259
  • 1
  • 20
  • 27
  • got an error `Error in as.POSIXct.numeric(dates) : 'origin' must be supplied` what does it mean? –  Feb 25 '16 at 08:58
  • i have added `origin = "2015-02-09"` but got this error: `Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 12407, 12429`. Do you have any idea? Thanks! –  Feb 25 '16 at 09:14
  • Put your data in reproducible examples in the question and I can take a look at what is happening. The conversion issue is caused by me using the data in the question as strings, and you having them in a different type. The number of rows issue may be because there is something skewed in your data; I don't think I can help with that. – kasterma Feb 25 '16 at 10:46
  • I have solved the problem of differing number of rows already. But the conversion issue still exists. I set the `origin` as "2015-02-09" and all the `date` value in `df3` is `2015-02-09` and just the value of seconds are different. –  Feb 25 '16 at 11:08
  • http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – kasterma Feb 25 '16 at 12:39