2

I want to create a matrix from my data. My data consists of two columns, date and my observations for each date. I want the matrix to have year as rows and days as columns, e.g. :

      17   18   19   20   ...   31
1904  x11  x12  ...
1905
1906
.
.
.
2019

The days in this case is for December each year. I would like missing values to equal NA.

Here's a sample of my data:

> head(cdata)
# A tibble: 6 x 2
  Datum               Snödjup
  <dttm>                <dbl>
1 1904-12-01 00:00:00    0.02
2 1904-12-02 00:00:00    0.02
3 1904-12-03 00:00:00    0.01
4 1904-12-04 00:00:00    0.01
5 1904-12-12 00:00:00    0.02
6 1904-12-13 00:00:00    0.02

I figured that the first thing I need to do is to split the date into year, month and day (European formatting, YYYY-MM-DD) so I did that and got rid of the date column (the one that says Datum) and also got rid of the unrelevant days, namely the ones < 17.

cdata %>%
  dplyr::mutate(year = lubridate::year(Datum), 
                month = lubridate::month(Datum), 
                day = lubridate::day(Datum))
select(cd, -c(Datum))

cu <- cd[which(cd$day > 16
                         & cd$day < 32
                                    & cd$month == 12),]

and now it looks like this:

> cu
# A tibble: 1,284 x 4
   Snödjup  year month   day
     <dbl> <dbl> <dbl> <int>
 1    0.01  1904    12    26
 2    0.01  1904    12    27
 3    0.01  1904    12    28
 4    0.12  1904    12    29
 5    0.12  1904    12    30
 6    0.15  1904    12    31
 7    0.07  1906    12    17
 8    0.05  1906    12    18
 9    0.05  1906    12    19
10    0.04  1906    12    20
# … with 1,274 more rows

Now I need to fit my data into a matrix with missing values as NA. Is there anyway to do this?

2 Answers2

1

You can try :

library(dplyr)
library(tidyr)

cdata %>%
   mutate(year = lubridate::year(Datum), 
          day = lubridate::day(Datum)) %>%
   filter(day >= 17) %>%
   complete(day = 17:31) %>%
   select(year, day, Snödjup) %>%
   pivot_wider(names_from = day, values_from = Snödjup)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks! This one kind of works, but it starts with day 1, I just want day 17 - 31 of December. It doesn't show missing values as NA, the days for the missing values are instead omitted. – user12221453 Apr 06 '20 at 19:48
  • @user12221453 Can you check the updated answer and see if it works. – Ronak Shah Apr 06 '20 at 23:57
  • I'm afraid not. The first year (1904) was omitted and the value of the observations seems random – user12221453 Apr 07 '20 at 09:29
  • Ok..In that case, please give a reproducible example using `dput` so that is is easier to help when we have your data. `dput(cdata)`. – Ronak Shah Apr 07 '20 at 09:30
1

Base R approach, using by.

r <- `colnames<-`(do.call(rbind, by(dat, substr(dat$date, 1, 4), function(x) x[2])), 1:31)
r[,17:31]
#         17    18    19   20    21    22    23   24    25    26    27    28   29    30   31
# 1904 -0.28 -2.66 -2.44 1.32 -0.31 -1.78 -0.17 1.21  1.90 -0.43 -0.26 -1.76 0.46 -0.64 0.46
# 1905  1.44 -0.43  0.66 0.32 -0.78  1.58  0.64 0.09  0.28  0.68  0.09 -2.99 0.28 -0.37 0.19
# 1906 -0.89 -1.10  1.51 0.26  0.09 -0.12 -1.19 0.61 -0.22 -0.18  0.93  0.82 1.39 -0.48 0.65

Toy data

set.seed(42)
dat <- do.call(rbind, lapply(1904:1906, function(x) 
  data.frame(date=seq(ISOdate(x, 12, 1, 0), ISOdate(x, 12, 31, 0), "day" ),
             value=round(rnorm(31), 2))))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • I tried all the code with the sample data and that worked fine, but then I tried with my own: `> r <- (do.call(rbind, by(cdata, substr(cdata$Datum, 1, 4), function(x) x[2])), 1:31) Error: unexpected ',' in "r <- (do.call(rbind, by(cdata, substr(cdata$Datum, 1, 4), function(x) x[2])),"` – user12221453 Apr 06 '20 at 22:34
  • Either `"colnames<-"(do.call(rbind, by(cdata, substr(cdata$Datum, 1, 4), function(x) x[2])), 1:31)` or `do.call(rbind, by(cdata, substr(cdata$Datum, 1, 4), function(x) x[2]))` – just count your parentheses. – jay.sf Apr 07 '20 at 04:09
  • Okay, this sort of works. But the row names just change between 1904 and 1905. So first row is 1904, second is 1905, third is 1904, fourth is 1905 and so on. – user12221453 Apr 07 '20 at 09:28
  • Please consider: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 – jay.sf Apr 07 '20 at 10:29