0

I have some data that looks like this:

structure(list(...1 = c(NA, "F00001092B - Return", NA, "F00001092A - Return", 
NA, NA, NA, "F000015112 - Return", NA, "0P0001KFG1 - Return", 
NA, "0P0001KFG2 - Return", NA, "0P0001KG31 - Return"), ...2 = c("43220", 
"7.1993099999999997", "43220", "9.9567099999999993", "-N/A", 
NA, "44074", "5.0628000000000002", "44104", "-7.2036899999999999", 
"44104", "-7.71122", "44074", "3.45444"), ...3 = c(43251, 2.50914, 
43251, 7.08661, NA, NA, 44104, -9.33194, 44135, -1.46045, 44135, 
-1.33158, 44104, -6.43119), ...4 = c(43281, 10.58262, 43281, 
9.74265, NA, NA, 44135, -2.18682, 44165, 19.35054, 44165, 19.1103, 
44135, -2.34933), ...5 = c(43312, -2.08777, 43312, -3.51759, 
NA, NA, 44165, 21.84286, 44196, -2.95694, 44196, -3.05413, 44165, 
21.73591), ...6 = c(43343, 15.81356, 43343, 19.44444, NA, NA, 
44196, -0.38177, 44227, 7.80132, 44227, 7.27166, 44196, -5.5202
), ...7 = c(43373, -3.663, 43373, -4.36047, NA, NA, 44227, 7.00191, 
44255, 5.01423, 44255, 8.44608, 44227, 6.63778), ...8 = c(43404, 
-1.46088, 43404, -3.19149, NA, NA, 44255, 4.64376, 44286, 11.27339, 
44286, 8.3409, 44255, 3.08164), ...9 = c(43434, -7.53453, 43434, 
-9.89011, NA, NA, 44286, 7.97628, 44316, 3.1574, 44316, 3.81616, 
44286, 9.15585), ...10 = c(43465, -3.08588, 43465, -4.00697, 
NA, NA, 44316, 6.94333, 44347, 4.03152, 44347, 2.85649, 44316, 
6.65098), ...11 = c(43496, -1.09915, 43496, -0.54446, NA, NA, 
44347, 5.04557, 44377, 6.4516, 44377, 6.63552, 44347, 2.31668
), ...12 = c(43524, 10.30018, 43524, 12.22628, NA, NA, 44377, 
2.83606, 44408, -3.87638, 44408, -3.88994, 44377, 5.59935), ...13 = c(43555, 
4.50815, 43555, 7.47968, NA, NA, 44408, -3.8261, 44439, -1.63488, 
44439, -0.88845, 44408, -4.49692)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14"))

Snippet of the Data Frame This data is essentially organized into groups of two rows. The top row in each group denotes date (looks like this because it was imported from excel), and the bottom row denotes the returns for a firm on the associated date.

I need to place this data into panel format where I have data that looks like the follwing:

Date  Firm  Return
10/2   X    2.04
10/3   X    2.07
10/2   Y    3.4
10/3   Y    4.2

Within groups of two, the data is wide form. Is it possible to pivot the data longer within every two rows then append this back together?

I have tried something like this to no avail:

#Create Row Identifiers to Group the Data
rets$two_days <- c(0, rep(1:(nrow(rets)-1)%/%2))

#Group by every two days and pivot wider
rets%>%group_by(two_days)%>%
  pivot_longer(cols = !1,names_to = "month", values_to = "ret")

Really stumped on how to tackle this problem.

TarJae
  • 72,363
  • 6
  • 19
  • 66
Matt Flynn
  • 11
  • 3
  • 1
    Images of data are not that useful for us to help. Try editing your question with some example data. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Axeman May 12 '23 at 16:12
  • Can you please post an expected result for your given example? – Andre Wildberg May 12 '23 at 16:36

2 Answers2

0

Here is how we could do it: (Not as trivial as one could thought, but it is doable):

Basically we split this one data frame in two. The first one contains the Returns and the second one contains the Dates. Finally we join them to the resulting data frame. And in between there is much coding fun:

library(tidyverse)
library(janitor)

df %>%
  filter(!is.na(...1)) %>% 
  mutate(across(-...1, as.numeric)) %>% 
  pivot_longer(cols = -...1,
               values_to = "Return") %>% 
  mutate(id = row_number()) %>% 
  left_join(df %>% 
              filter(is.na(...1)) %>% 
              filter(!if_any(-...1, ~is.na(.))) %>% 
              mutate(across(-...1, ~excel_numeric_to_date(as.numeric(.)))) %>% 
              pivot_longer(cols = -...1, 
                           values_to = "Date") %>% 
              mutate(id = row_number()), by="id") %>% 
  select(Date, Firm = ...1.x, Return) %>% 
  mutate(Firm = str_remove(Firm, " - Return"))

output:

 Date       Firm       Return
   <date>     <chr>       <dbl>
 1 2018-04-30 F00001092B   7.20
 2 2018-05-31 F00001092B   2.51
 3 2018-06-30 F00001092B  10.6 
 4 2018-07-31 F00001092B  -2.09
 5 2018-08-31 F00001092B  15.8 
 6 2018-09-30 F00001092B  -3.66
 7 2018-10-31 F00001092B  -1.46
 8 2018-11-30 F00001092B  -7.53
 9 2018-12-31 F00001092B  -3.09
10 2019-01-31 F00001092B  -1.10
# … with 62 more rows

data:

structure(list(...1 = c(NA, "F00001092B - Return", NA, "F00001092A - Return", 
NA, NA, NA, "F000015112 - Return", NA, "0P0001KFG1 - Return", 
NA, "0P0001KFG2 - Return", NA, "0P0001KG31 - Return"), ...2 = c("43220", 
"7.1993099999999997", "43220", "9.9567099999999993", "-N/A", 
NA, "44074", "5.0628000000000002", "44104", "-7.2036899999999999", 
"44104", "-7.71122", "44074", "3.45444"), ...3 = c(43251, 2.50914, 
43251, 7.08661, NA, NA, 44104, -9.33194, 44135, -1.46045, 44135, 
-1.33158, 44104, -6.43119), ...4 = c(43281, 10.58262, 43281, 
9.74265, NA, NA, 44135, -2.18682, 44165, 19.35054, 44165, 19.1103, 
44135, -2.34933), ...5 = c(43312, -2.08777, 43312, -3.51759, 
NA, NA, 44165, 21.84286, 44196, -2.95694, 44196, -3.05413, 44165, 
21.73591), ...6 = c(43343, 15.81356, 43343, 19.44444, NA, NA, 
44196, -0.38177, 44227, 7.80132, 44227, 7.27166, 44196, -5.5202
), ...7 = c(43373, -3.663, 43373, -4.36047, NA, NA, 44227, 7.00191, 
44255, 5.01423, 44255, 8.44608, 44227, 6.63778), ...8 = c(43404, 
-1.46088, 43404, -3.19149, NA, NA, 44255, 4.64376, 44286, 11.27339, 
44286, 8.3409, 44255, 3.08164), ...9 = c(43434, -7.53453, 43434, 
-9.89011, NA, NA, 44286, 7.97628, 44316, 3.1574, 44316, 3.81616, 
44286, 9.15585), ...10 = c(43465, -3.08588, 43465, -4.00697, 
NA, NA, 44316, 6.94333, 44347, 4.03152, 44347, 2.85649, 44316, 
6.65098), ...11 = c(43496, -1.09915, 43496, -0.54446, NA, NA, 
44347, 5.04557, 44377, 6.4516, 44377, 6.63552, 44347, 2.31668
), ...12 = c(43524, 10.30018, 43524, 12.22628, NA, NA, 44377, 
2.83606, 44408, -3.87638, 44408, -3.88994, 44377, 5.59935), ...13 = c(43555, 
4.50815, 43555, 7.47968, NA, NA, 44408, -3.8261, 44439, -1.63488, 
44439, -0.88845, 44408, -4.49692)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14"))
TarJae
  • 72,363
  • 6
  • 19
  • 66
0

Here's an approach where we start off assigning each row to a "type" of either Return or Date, and a "group" relating to its eventual output row. Then we can reshape long and then wide by "type" to get the shape we want.

library(tidyverse)
df %>%
  mutate(type = if_else(row_number() %% 2 == 0, "Return", "Date"),
         group = (row_number() + 1) %/% 2, .before = 0) %>%
  mutate(across(4:last_col(), as.numeric)) %>%
  fill(`...1`, .direction = "up") %>%
  pivot_longer(4:last_col()) %>%
  pivot_wider(names_from = type, values_from = value) %>%
  mutate(Date = janitor::excel_numeric_to_date(Date))

Result

# A tibble: 84 × 5
   group ...1                name  Date       Return
   <dbl> <chr>               <chr> <date>      <dbl>
 1     1 F00001092B - Return ...2  2018-04-30   7.20
 2     1 F00001092B - Return ...3  2018-05-31   2.51
 3     1 F00001092B - Return ...4  2018-06-30  10.6 
 4     1 F00001092B - Return ...5  2018-07-31  -2.09
 5     1 F00001092B - Return ...6  2018-08-31  15.8 
 6     1 F00001092B - Return ...7  2018-09-30  -3.66
 7     1 F00001092B - Return ...8  2018-10-31  -1.46
 8     1 F00001092B - Return ...9  2018-11-30  -7.53
 9     1 F00001092B - Return ...10 2018-12-31  -3.09
10     1 F00001092B - Return ...11 2019-01-31  -1.10
Jon Spring
  • 55,165
  • 4
  • 35
  • 53