0

I have a big dataset and I'd like to plot zizi vs hour but only for each hour while the variables looks like this:

> datasetjc$hour[1:100]
  [1] 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
 [40] 23 23 23 23 23 23 23 23 23 23 23 23 23 23  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 [79]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 

> datasetjc$zizi[1:100]
  [1]  2 27  2  3 45  0  6  0 15  8  3  1  4  0  0 15  1 13  0 15 23  8 21  2  0  9 43 26 31 33 11  0  4  7 26  2 25 14  1
 [40]  3  1  6  3  4  3  2 27  2  3 45  0  7  0 15  8  3  1  4  0  4 26  0 15  1  4  0 15 14 12 23  8  3 21 13  2  0 32 43
 [79] 31 11  4  0  4  7 26 10  2 25 25  1  1  4  4 23  3  2 27  2 45  0
> 

I also have the minutes, dates and days vaiables. Each data are only separated by 5 minutes. How can I do the plot?

Thx

BillyLeZeurbé
  • 57
  • 1
  • 10

1 Answers1

0

If we take your question to be "how do I get hourly summaries of data taken at five minute intervals", then this is a classic split-apply-combine. Average data by group is a great summary of the different techniques.

For this particular example, in vanilla R you use the aggregate or by function.

> df <- data.frame(hour=c(1,1,2,2,3,3,4,4), zizi=1:8)
> aggregate(zizi ~ hour, data=df, mean)
  hour zizi
1    1  1.5
2    2  3.5
3    3  5.5
4    4  7.5

If you wish to summarize by date/hour, then you use +:

> aggregate(zizi ~ hour + day, data=df, mean)

For more advanced versions of this, I would recommend investing some time in learning either dplyr or data.table, as both are excellent libraries for doing more complex versions of this extremely common task.

Also for future reference, see How to make a great R reproducible example? for suggestions on how to pose a question more clearly.

Community
  • 1
  • 1
user295691
  • 7,108
  • 1
  • 26
  • 35