0

My data frame has different dates as rows. Every unique date occurs appr. 500 times. I want to make a new data frame where every column is a unique date and where the rows are all the observations of that date from my old dataset. So for every column dat represents a certain date, I should have appr. 500 rows that each represent a rel_spread from that day.

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
ameliedc
  • 13
  • 3

2 Answers2

1

You can use pivot_wider from tidyr:

library(tidyr)

pivot_wider(df, names_from = date, values_from = rel_spread, values_fn = list) %>%
  unnest(everything())
#> # A tibble: 2 x 17
#>   `20000103` `20000104` `20000105` `20000106` `20000107` `20000108` `20000109`
#>        <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#> 1    -0.0234    -0.0128    0.00729     0.0408    -0.0298     0.0398     0.0445
#> 2     0.0492    -0.0120    0.0277      0.0435    -0.0288     0.0152    -0.0374
#> # ... with 10 more variables: `20000110` <dbl>, `20000111` <dbl>,
#> #   `20000112` <dbl>, `20000113` <dbl>, `20000114` <dbl>, `20000115` <dbl>,
#> #   `20000116` <dbl>, `20000117` <dbl>, `20000118` <dbl>, `20000119` <dbl>

Note that we don't have your data (and I wasn't about to transcribe a picture of your data), but I created a little reproducible data set which should match the structure of your data set, except it only has two values per date for demo purposes:

set.seed(1)
df <- data.frame(date = rep(as.character(20000103:20000119), 2),
                 rel_spread = runif(34, -0.05, 0.05))

df
#>        date    rel_spread
#> 1  20000103 -0.0234491337
#> 2  20000104 -0.0127876100
#> 3  20000105  0.0072853363
#> 4  20000106  0.0408207790
#> 5  20000107 -0.0298318069
#> 6  20000108  0.0398389685
#> 7  20000109  0.0444675269
#> 8  20000110  0.0160797792
#> 9  20000111  0.0129114044
#> 10 20000112 -0.0438213730
#> 11 20000113 -0.0294025425
#> 12 20000114 -0.0323443247
#> 13 20000115  0.0187022847
#> 14 20000116 -0.0115896282
#> 15 20000117  0.0269841420
#> 16 20000118 -0.0002300758
#> 17 20000119  0.0217618508
#> 18 20000103  0.0491906095
#> 19 20000104 -0.0119964821
#> 20 20000105  0.0277445221
#> 21 20000106  0.0434705231
#> 22 20000107 -0.0287857479
#> 23 20000108  0.0151673766
#> 24 20000109 -0.0374444904
#> 25 20000110 -0.0232779331
#> 26 20000111 -0.0113885907
#> 27 20000112 -0.0486609667
#> 28 20000113 -0.0117612043
#> 29 20000114  0.0369690846
#> 30 20000115 -0.0159651003
#> 31 20000116 -0.0017919885
#> 32 20000117  0.0099565825
#> 33 20000118 -0.0006458693
#> 34 20000119 -0.0313782399
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
1

Allan’s answer is perfect if you have the same number of rows for each date. If this isn’t the case, the following should work:

library(tidyr)
library(dplyr)

data_wide <- data_long %>%
  group_by(date) %>%
  mutate(daterow = row_number()) %>%
  ungroup() %>%
  pivot_wider(names_from = date, values_from = rel_spread) %>%
  select(!daterow)
  
data_wide

Output:

# A tibble: 6 x 4
  `20000103` `20000104` `20000105` `20000106`
       <dbl>      <dbl>      <dbl>      <dbl>
1     -0.626      0.184     -0.836    -0.621 
2      1.60       0.330     -0.820    -2.21  
3      0.487      0.738      0.576     1.12  
4     -0.305      1.51       0.390    -0.0449
5     NA         NA         NA        -0.0162
6     NA         NA         NA         0.944 

Example data:

set.seed(1)

data_long <- data.frame(
  date = c(rep(20000103:20000105, 4), rep(20000106, 6)),
  rel_spread = rnorm(18)
)
zephryl
  • 14,633
  • 3
  • 11
  • 30