0

Below is a small example of the type of data that I'm working with. The real data set is about 8 million rows with about 450 stat names in the Stat_Name column for every single date from 8/31/2019 - 1/10/2020. All I'm needing to do is get a function that would convert df1 below to df2. I bet it's fairly straightforward and I think melt() might be able to accomplish this but I'm unsure. Appreciate all of your help ahead of time! Thanks.

rm(list=ls())

df1 <- data.frame(Team_Code = c(728,728,728,728),
                  Coef_Name = c('Team_728','Team_728','Team_728','Team_728'),
                  Year = 2021,
                  Date = c('8/31/2021','8/31/2021','9/1/2021','9/1/2021'),
                  Stat_Name = c('Points','OppPoints','Points','OppPoints'),
                  Ridge_Reg_Coef = c(20,15,22,16),
                  Adj_Stat_Value = c(21.5,14,20.5,17))

df2 <- data.frame(Team_Code = c(728,728),
                  Coef_Name = c('Team_728','Team_728'),
                  Year = 2021,
                  Date = c('8/31/2021','9/1/2021'),
                  Points = c(21.5,20.5),
                  OppPoints = c(14,17))
Lcsballer1
  • 97
  • 8

1 Answers1

1

A tidyverse solution

 df1 %>% 
  # Removing Ridge_Reg_Coef
  select(-Ridge_Reg_Coef) %>% 
  # Pivotting data to a wider format, using Stat_name as variable and Adj_Stat_Value as values
  pivot_wider(names_from = Stat_Name,values_from = Adj_Stat_Value)

# A tibble: 2 x 6
  Team_Code Coef_Name  Year Date      Points OppPoints
      <dbl> <chr>     <dbl> <chr>      <dbl>     <dbl>
1       728 Team_728   2021 8/31/2021   21.5        14
2       728 Team_728   2021 9/1/2021    20.5        17
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32