0

I am trying to simplify my time series data by combining 30 time series to 1 overall series. In other words I want to change my data from:

   Province       Date      Confirmed
1    Anhui 2020-01-21               0
2    Anhui 2020-01-22               0
3    Anhui 2020-01-23               0
4    Anhui 2020-01-24               6
5    Anhui 2020-01-25              24
6    Anhui 2020-01-26              45
7    Anhui 2020-01-27              55
8    Anhui 2020-01-28              91
9    Anhui 2020-01-29             137
10   Anhui 2020-01-30             200
11   Anhui 2020-01-31             237
12   Anhui 2020-02-01             297
13   Anhui 2020-02-02             340
14   Anhui 2020-02-03             408
15   Anhui 2020-02-04             480
16   Anhui 2020-02-05             530
17   Anhui 2020-02-06             591
18   Anhui 2020-02-07             665
19   Anhui 2020-02-08             733
20   Anhui 2020-02-09             779
21   Anhui 2020-02-10             830
22   Anhui 2020-02-11             860
23   Anhui 2020-02-12             889
24 Beijing 2020-01-21               5
25 Beijing 2020-01-22               5
26 Beijing 2020-01-23              10
27 Beijing 2020-01-24              20
28 Beijing 2020-01-25              35
29 Beijing 2020-01-26              47
30 Beijing 2020-01-27              59

To something like:

      Date      Confirmed
2020-01-21              5
2020-01-22              0
2020-01-23             10
2020-01-24             26
2020-01-25             59
2020-01-26             92
.
.
.

Basically I want to add all of the "confirmed" values for each province by their respective date. Can somebody please guide me?

Student
  • 61
  • 4

1 Answers1

0

We can use aggregate

aggregate(Confirmed ~ Date, df1, sum)

If 'Province' is also a grouping column

aggregate(Confirmed ~ Date + Province, df1, sum)
akrun
  • 874,273
  • 37
  • 540
  • 662