0

I have a dataset which has three columns. Example data is in code below.

   Duration Amounts      Dates
1         9 -10.735 2017-07-30
2         4 156.247 2001-12-04
3         7  97.753 2014-02-12
4         1  99.190 1999-01-22
5         2 147.192 2014-12-25
6         7 141.061 2015-01-18
7         2 129.695 2016-05-07
8         3 145.949 2014-01-12
9         1 139.107 2017-03-09
10        5 103.728 2002-02-01
11        5   0.532 2001-09-09
12       10 130.991 2003-11-26
13        6  97.194 2014-05-13
14       10  92.210 2009-09-14
15        7  26.462 2014-09-14
16        9  76.092 2003-06-27
17        5 120.897 2012-07-09
18        5 167.934 2000-04-09
19        9  94.861 2012-12-09
20        9 119.384 2014-06-15
n <- 20
set.seed(1)        
Duration <- sample(10,size=n,replace=TRUE)
Amounts <- rnorm(n,100,50)
Dates <- sample(seq(as.Date('1999/01/01'), as.Date('2019/01/01'), by="day"), n,replace = TRUE)

My_Dataset <- data.frame(Duration,Amounts,Dates)

I wanted to sum the Amounts column by years in Date column and Indexes in Duration column.

I want to see the result to be for example for entry 18 to appear as follows:

      1   2   3   4   5 ...
2000  x   x   x   x   167.9
2001  x   x   x   x   x
2002  x   x   x   x   x
...
Cettt
  • 11,460
  • 7
  • 35
  • 58
Sam
  • 65
  • 1
  • 1
  • 8
  • @akrun Note this question involves aggregation and not just reshaping. It may be a duplicate, but not of the question marked. – zack May 16 '19 at 15:14
  • 1
    @zack I reopened. I didn't get your message, may be there is a space before the `@` – akrun May 17 '19 at 02:16

1 Answers1

3

you can use the tidyverse package like this:

library(tidyverse)

    My_Dataset %>% 
      mutate(Dates = format(Dates, "%Y")) %>% 
      group_by(Dates, Duration) %>% 
      summarise(Amounts = sum(Amounts)) %>% 
      spread(Duration, Amounts)


 # A tibble: 11 x 11
# Groups:   Dates [11]
   Dates   `1`   `2`   `3`     `4`   `5`   `6`   `7`   `8`   `9`  `10`
   <chr> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 2000    NA    NA    NA   NA      NA    NA    141.  NA    NA    NA  
 2 2004    NA    NA    NA  119.     NA    NA     NA   NA    97.8  NA  
 3 2005    NA    NA   146.   0.532  NA    NA    104.  NA    NA   -10.7
 4 2006    NA    NA    NA   NA      NA    NA     NA   NA    NA    26.5
 5 2008    NA    NA    NA   NA      NA    68.9   NA  131.   NA    99.2
 6 2012    NA    NA   156.  NA      NA    NA     NA   NA    NA    NA  
 7 2014    NA    NA    NA   76.1    NA    NA    147.  NA    NA    NA  
 8 2015    NA   139.   NA   NA      NA    NA     NA   NA    NA    NA  
 9 2016   130.   NA    NA   NA      97.2  NA     NA   92.2  NA    NA  
10 2017    NA    NA   176.  NA      NA    NA     NA   NA    NA    NA  
11 2018    NA    NA    NA   NA      NA    NA     NA  121.   NA    NA 
Cettt
  • 11,460
  • 7
  • 35
  • 58
  • Thanks a lot! It worked. If I could ask additional question. If I have additional column with types of rows, how could I filter out certain rows? – Sam May 16 '19 at 15:28
  • Please don't ask follow up questions inside the comments. Rather ask a new question. – Cettt May 16 '19 at 15:37