0

I'm working with a dataframe including the columns 'timestamp' and 'amount'. The data can be produced like this

sample_size <- 40

start_date = as.POSIXct("2020-01-01 00:00")
end_date = as.POSIXct("2020-01-03 00:00")

timestamps <- as.POSIXct(sample(seq(start_date, end_date, by=60), sample_size))
amount <- rpois(sample_size, 5)

df <- data.frame(timestamps=timestamps, amount=amount)

Now I'd like to plot the sum of the amount entries for some timeframe (like every hour, 30 min, 20 min). The final plot would look like a histogram of the timestamps but should not just count how many timestamps fell into the timeframe, but what amount fell into the timeframe.

How can I approach this? I could create an extra vector with the amount of each timeframe, but don't know how to proceed.

Also I'd like to add a feature to reduce by hour. Such that just just one day is plotted (notice the range between start_date and end_date is two days) and in each timeframe (lets say every hour) the amount of data located in this hour is plotted. In this case the data

2020-01-01 13:03:00  5
2020-01-02 13:21:00 10
2020-01-02 13:38:00  1
2020-01-01 13:14:00  3

would produce a bar of height sum(5, 10, 1, 3) = 19 in the timeframe 13:00-14:00. How can I implement the plotting to easily switch between these two modes (plot days/plot just one day and reduce)?

EDIT: Following the advice of @Gregor Thomas I added a grouping column like this:

df$time_group <- lubridate::floor_date(df$timestamps, unit="20 minutes")

Now I'm wondering how to ignore the dates and thus reduce by 20 minute frame (independent of date).

s1624210
  • 627
  • 4
  • 11
  • Add a grouping column at whatever grain you want - hour, half hour, 20 min, whatever. With `ggplot`, if you are using `geom_col`, it will add up the values by default. With base (or also with ggplot) you could also summarize your data to the aggregate level, using your favorite method from the [FAQ on summing by groups](https://stackoverflow.com/q/1660124/903061). If you need more help than that, please show your attempt and where you are getting stuck. – Gregor Thomas Mar 31 '20 at 14:26
  • `lubridate::floor_date` can be used for your grouping variables, e.g., `df$group_20min = lubridate::floor_date(df$timestamps, unit = "20 minutes")` – Gregor Thomas Mar 31 '20 at 14:29
  • @GregorThomas What you do you mean with _grouping column_. What should a grouping column contain? – s1624210 Mar 31 '20 at 14:38
  • I mean `df$group_20min = lubridate::floor_date(df$timestamps, unit = "20 minutes")`, `df$group_20min` is now a grouping column. It contains the datetime at the start of each of your time frames. Use it on the x axis. – Gregor Thomas Mar 31 '20 at 14:42
  • @GregorThomas Thank you, I was just using this. You can see my edit. I'll also look into ggplot's `geom_col` – s1624210 Mar 31 '20 at 14:44
  • Ah -- missed the part about ignoring the date, just paying attention to the time. Pick an arbitrary day and set the date parts to that day.... don't have time to write up a full answer right now, but hopefully someone else will be along shortly – Gregor Thomas Mar 31 '20 at 14:48
  • @GregorThomas Thanks a lot. I'll try to work it our myself and report back. This really helped me. – s1624210 Mar 31 '20 at 14:48
  • I think a quick way is `yday(df$group_20min) = 1)`. That will set the date of the grouping column to Jan 1 (will work assuming everything is in the same year, otherwise set the year too). Then you need to format the axis with `scale_x_time` to not show the date, only the time. – Gregor Thomas Mar 31 '20 at 14:51
  • @GregorThomas I was just using `lubridate::date(df$group_30min_no_date) <- as.Date("1970-01-01")` and will try to work it out with `yday` as well. – s1624210 Mar 31 '20 at 14:58
  • @GregorThomas Ok, now I'm just stuck at ggplot's `geom_col`. I'm new to R and never used ggplot, but I'll look up examples online. – s1624210 Mar 31 '20 at 15:05
  • `ggplot(df, aes(x = group_30min_no_date, y = amount)) + geom_col()`. Your `date` method for setting the date looks great, more robust than my `yday`. – Gregor Thomas Mar 31 '20 at 15:08
  • @GregorThomas Perfect. This has not much more to do with the question but can you recommend an introduction to ggplot2? I'll move this discussion to chat now. – s1624210 Mar 31 '20 at 15:14

0 Answers0