0

I have a datatable with 96 different variables, including weekly attendance at NFL games of 17 different weeks.

The colnames of df looks like this:

colnames(df)
 [1] "NFL_team_name"        "year"                 "season_performance"   "margin_of_victory"    "strength_of_schedule"
 [6] "simple_rating"        "offensive_ranking"    "defensive_ranking"    "playoffs"             "sb_winner"           
[11] "price"                "weekly_attendance.1"  "day.1"                "time.1"               "home_ind.1"          
[16] "winner.1"             "weekly_attendance.2"  "day.2"                "time.2"               "home_ind.2"          
[21] "winner.2"             "weekly_attendance.4"  "day.4"                "time.4"               "home_ind.4"          
[26] "winner.4"             "weekly_attendance.5"  "day.5"                "time.5"               "home_ind.5"          
[31] "winner.5"             "weekly_attendance.6"  "day.6"                "time.6"               "home_ind.6"  

and so on..

Some of the weekly attendance columns have NA's and there I want to put in the mean of the rest of the weekly attendance columns, based on the row number. The weekly attendance columns is 12,17,22,27... as seen below. I have tried something like the following, but I don't really know how to get it to work:

all rows with weekly att.:

mean(df[1,c(12,17,22,27,32,37,42,47,52,57,62,67,72,77,82,87,92)])

the means the weekly attendance columns of each row (team & year):

rowmeans <- as.data.table(rowMeans(df[,c(12,17,22,27,32,37,42,47,52,57,62,67,72,77,82,87,92)], na.rm = T))

use the rowmeans to replace na's (SOMETHING LIKE THIS):

for (i in 1:nrow(df)) {
  if (is.na(df[i,])) {
    df[i,] <- rowmeans[i,]
  }
  else
    next
}

So what I want, is to fill in the NA's in each row, based on the mean of the weekly attendance columns in each row.

Hope it makes sense, and that some of you can tell me what is missing.

1 Answers1

0

Hard to be certain without a Minimal Reproducible Example, but you might try something like this:

# find column names with weekly attendance figures
wa_cols <- grep("weekly_attendance", colnames(df), value=TRUE)

# calculate the mean for each row for just those columns
wa_mean <- rowMeans(df[, wa_cols], na.rm=TRUE)

# loop over weekly attendance columns, filling in if missing
for (x in wa_cols) {
  df[[x]] <- ifelse(is.na(df[[x]]), wa_mean, df[[x]])
}
Vincent
  • 15,809
  • 7
  • 37
  • 39