In R, is there a way to to gather a data frame while only extracting part of the column name?

Question

Overview

So, I'm looking to tidy my data frame. I have found a solution to my problem but it seems highly inefficient when I am working with my large dataset. Currently my code gathers my data frame, applies a separate function to split the ticker from the metric, and then spreads the data appropriately. See the example below

Data frame

    structure(list(date = c("2009-07-01", "2009-07-02", "2009-07-06", 
"2009-07-07", "2009-07-08"), PRED.Open = c(0.5, 0.5, 0.7, 0.7, 
0.7), PRED.High = c(0.5, 0.6, 0.7, 0.7, 0.7), PRED.Low = c(0.5, 
0.5, 0.5, 0.7, 0.7), PRED.Close = c(0.5, 0.6, 0.5, 0.7, 0.7), 
    PRED.Volume = c(0L, 300L, 200L, 0L, 0L), PRED.Adjusted = c(0.5, 
    0.6, 0.5, 0.7, 0.7), GDM.Open = c(1041.02002, 1085.109985, 
    1052.02002, 1011.429993, 1006.630005), GDM.High = c(1097.790039, 
    1085.109985, 1052.02002, 1029.290039, 1006.630005), GDM.Low = c(1041.02002, 
    1038.540039, 995.450012, 1005.280029, 948.73999), GDM.Close = c(1085.109985, 
    1052.02002, 1011.429993, 1006.630005, 966.22998), GDM.Volume = c(0L, 
    0L, 0L, 0L, 0L), GDM.Adjusted = c(1085.109985, 1052.02002, 
    1011.429993, 1006.630005, 966.22998), NBL.Open = c(29.885, 
    29.325001, 27.370001, 27.485001, 26.815001), NBL.High = c(30.35, 
    29.325001, 27.545, 27.610001, 27.18), NBL.Low = c(29.83, 
    28.07, 26.825001, 26.605, 25.745001)), row.names = c(NA, 
-5L), class = "data.frame")

Current Solution

df <- df %>%  gather(c(2:ncol(df)), key = "ticker", value = "val")

df <- separate(df, col = "ticker", into = c("ticker", "metric"), sep = "\\.") %>% 
  ungroup() %>% 
  spread(key = "metric", value = "val") %>% 
  arrange(ticker, date)

Desired Outcome

Question

Is there a more efficient way to accomplish this?

score 1 · Accepted Answer · answered Sep 11 '20 at 02:28

If you use pivot_longer from tidyr 1.0.0 you can do this in one line :

tidyr::pivot_longer(df, 
                    cols = -date, 
                    names_to = c('ticker', '.value'), 
                    names_sep = '\\.') %>%
dplyr::arrange(ticker, date)

# A tibble: 15 x 8
#   date       ticker     Open     High      Low   Close Volume Adjusted
#   <chr>      <chr>     <dbl>    <dbl>    <dbl>   <dbl>  <int>    <dbl>
# 1 2009-07-01 GDM    1041.0   1097.8   1041.0   1085.1       0  1085.1 
# 2 2009-07-02 GDM    1085.1   1085.1   1038.5   1052.0       0  1052.0 
# 3 2009-07-06 GDM    1052.0   1052.0    995.45  1011.4       0  1011.4 
# 4 2009-07-07 GDM    1011.4   1029.3   1005.3   1006.6       0  1006.6 
# 5 2009-07-08 GDM    1006.6   1006.6    948.74   966.23      0   966.23
# 6 2009-07-01 NBL      29.885   30.35    29.83    NA        NA    NA   
# 7 2009-07-02 NBL      29.325   29.325   28.07    NA        NA    NA   
# 8 2009-07-06 NBL      27.370   27.545   26.825   NA        NA    NA   
# 9 2009-07-07 NBL      27.485   27.610   26.605   NA        NA    NA   
#10 2009-07-08 NBL      26.815   27.18    25.745   NA        NA    NA   
#11 2009-07-01 PRED      0.5      0.5      0.5      0.5       0     0.5 
#12 2009-07-02 PRED      0.5      0.6      0.5      0.6     300     0.6 
#13 2009-07-06 PRED      0.7      0.7      0.5      0.5     200     0.5 
#14 2009-07-07 PRED      0.7      0.7      0.7      0.7       0     0.7 
#15 2009-07-08 PRED      0.7      0.7      0.7      0.7       0     0.7

In R, is there a way to to gather a data frame while only extracting part of the column name?

1 Answers1