0

I have a data set of different cities and the different sports teams they have:

**City**    **Basketball**  **Soccer**    **Tennis**    **Football**
   AA         Titans          Astros       Royals         Eagles
   AA         Crimson           NA         Falcons        Packers
   AA           NA            Phoenix      Tigers           NA
   AA         Goblins           NA         Lions            NA
   BB           NA              NA           NA             NA
   BB         Bears           Rockets      Broncos          NA
   CC         Pelicans        Wildfires    Panthers         NA
   CC           NA            Bobcats      Blizzard         NA
   CC           NA              NA         Tornadoes        NA
   CC           NA              NA         Hurricanes       NA

I want to group the data by city and find the total number of NAs for each columnn for each city. How would I go about doing this? I've tried this after doing some research

aggregate(df[,2:4], by=list(df$City), colSums(is.na(df)))

but it doesn't work...

tmoneyz
  • 35
  • 5

2 Answers2

2

Base R is your enemy here.

data.table is friendlier:

library(data.table)
setDT(df) # <- convert to data.table
# going column-by-column, count NA
df[ , lapply(.SD, function(x) sum(is.na(x))), by = City]

See Getting Started with data.table, a primer on .SD, and this on the use of lapply(.SD,...) for more.

Note well that the use of colSums necessitates converting your data.frame to a matrix, which will force all the columns to have the same class (here, character) if they don't already, which can be costly.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
1

Your aggregate syntax was a bit off. Use a correct custom anonymous function and it should work:

aggregate(df[,2:4], by=list(df$City), FUN=function(x) { sum(is.na(x)) })

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360