0

I am working on a problem for a statistics class that utilizes baseball team data such as attendance, wins/losses, and other stats about baseball teams. The problem statement calls for variables to be created to include winning teams (those with 81 or more wins), losing teams (with less than 81 wins), and attendance figures on three categories, less than 2 million, between 2 and 3 million, and more than 3 million.

The raw data is keyed by team name, with one team per row and then the stats in each column.

I then need to create a table with counts of the number of teams along those dimensions, like:

Winning Season    Low Attendance  Med. Attendance  High Attendance  
Yes               3               12               3
No                2               10               2

We can use whatever tool we'd like to complete it and I am attempting to use R and RStudio to create the table in order to gain knowledge about stats and R at the same time. However, I can't figure out how to make it happen or what function(s) to use to create a table with those aggregate numbers.

I have looked at data.table and dplyr and others but I cannot seem to figure out how to get counts sorted by each team. If it was SQL, I would be able to

select count(*) from table where attend < 2000000 and wins < 81 

and then programmatically create the table. I can't figure out how to do the same in R.

Thank you for any help.

camille
  • 16,432
  • 18
  • 38
  • 60
user11504995
  • 64
  • 1
  • 3
  • https://www.r-bloggers.com/data-manipulation-with-r-spector-2008/ – Dij May 15 '19 at 15:20
  • 1
    As of right now, this is too broad. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on writing a reproducible example, including the code you've tried so far, even if it doesn't work. – camille May 16 '19 at 23:47
  • With `dplyr`, something like `your_data %>% group_by(\`Winning Season\`) %>% summarize_all(sum)`, assuming you've already got, e.g. Low, Med, and High attendance as binary columns. Can't really tell though, because without a reproducible example who knows what your data looks like! – Gregor Thomas May 17 '19 at 01:08

0 Answers0