1

I am working with a nested dataframe of football data with a model fitted to each of the dataframe. Using goalmodel::predict_result, I wanted to predict the outcome of each match based on the model. The predict_result function requires three arguments, the model, the vector of the hometeams and the vector of the awayteams. How do I reference the column name within a nested dataframe? Here is an example of my nested dataframe:

library(goalmodel)
library(tidyverse)

nested_df
# A tibble: 4 x 3
# Groups:   League [4]
  League                     data model     
  <chr>          <list<df[,133]>> <list>    
1 F1             [380 x 133] <goalmodl>
2 E0             [380 x 133] <goalmodl>
3 SP1            [380 x 133] <goalmodl>
4 D1             [308 x 133] <goalmodl>

If I, say, wanted to predict the results of the first element of this dataframe, which is F1, I would write

predict_result(nested_df$model[[1]], team1=nested_df$data[[1]]$HomeTeam, team2=nested_df$data[[1]]$AwayTeam, return_df = TRUE)

which returns a dataframe of the desired outcome. I have tried iterating the above function with purrr::map using:

map(nested_df,~predict_result(.x$model,
                                 team1=.x$data[[.]]$HomeTeam,
                                 team2=.x$data[[.]]$AwayTeam,
                                 return_df = TRUE))

It does not work, the error shows:

Error in .x$data : $ operator is invalid for atomic vectors

I would appreciate any help and suggestions, thanks in advance. -----Here is an example for reproducibility:

df <- tibble(League = c("F1","E0","SP1","D1"),
             HomeTeam = c("TeamA","TeamB","TeamC","TeamD"),
             AwayTeam = c("TeamE","TeamF","TeamG","TeamH"),
             FTHG = c(0,1,2,0),
             FTAG = c(0,1,0,2))

nested_df <- df %>%
  group_by(League)%>%
  nest()%>%
  mutate(model = map(data,~goalmodel::goalmodel(goals1 = .x$FTHG, goals2 = .x$FTAG,
                                     team1 = .x$HomeTeam, team2 = .x$AwayTeam,
                                     ,rs=TRUE)))

nested_df
# A tibble: 4 x 3
# Groups:   League [4]
  League           data model     
  <chr>  <list<df[,4]>> <list>    
1 F1            [1 x 4] <goalmodl>
2 E0            [1 x 4] <goalmodl>
3 SP1           [1 x 4] <goalmodl>
4 D1            [1 x 4] <goalmodl>

And I would like to use the goalmodel::predict_result function to iterate on all four of the data on nested_df with their respective goalmodl object. Thanks in advance.

Chewyham
  • 77
  • 8
  • Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Dec 15 '19 at 06:48
  • @Tung Thank you, sorry I am pretty new to this so my question is not so clear. I have edited the problem, please see if I clarifies? Thanks. – Chewyham Dec 15 '19 at 14:38

1 Answers1

2

You should use the map2 function.

map2(.x = nested_df$model,
     .y = nested_df$data,
     .f = ~ predict_result(model = .x,
                           team1 = .y$HomeTeam,
                           team2 = .y$AwayTeam,
                           return_df = TRUE))

You get a list of dataframes.

Or maybe like this:

nested_df <- nested_df %>% 
  mutate(pred = map2(.x = model,
                     .y = data,
                     .f = ~ predict_result(model = .x,
                                           team1 = .y$HomeTeam,
                                           team2 = .y$AwayTeam,
                                           return_df = TRUE)))

You get a list column in nested_df. With unnest you can get it as a dataframe.

ricoderks
  • 1,619
  • 9
  • 13