0

I have a dataframe of all NBA players, their team and their points per game. I want to create a new data frame listing team names as the first column, and the next five columns are the pts per game of their five leading scorers.

so... (made up numbers)

ATL 17.2 14.3 12.2 10.2 9.4

I'm trying to work through what might get me there. I'm thinking I need to create subsets of the first data frame for each team (listing each of their scorers), then sort all 30 data frames and then move the first 5 values in the pts per game column into a new data frame using [0:4].

Is there an easy way to use a for loop to create all 30 data frames? Maybe if I created a list for each team name and then did something like....

for i in list:
    create data frame i from ALLPLAYERS[TEAM = i]

Then I could use some other sort to sort them and add them into the final data frame.

Sorry, I know the "code" portion above isn't really the code, it's just what I'm thinking, I need to find the exact wording.

TryHarder01
  • 48
  • 1
  • 8
  • Please follow this guidelines when posting: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – lukeA Mar 04 '15 at 12:52
  • As you see from the answers, this can be accomplished more neatly without a for-loop. See Circle 3 of the [R Inferno](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf). – Sam Firke Mar 04 '15 at 15:03

3 Answers3

2

This works using data.table.

library(data.table)
nba = data.table(player = 1:100, team = rep(LETTERS[1:10], 
                      each = 10), ppg = 1:100)
nba[, as.list(tail(sort(ppg), 5)), by = team]

I use an unrealistic points-per-game but it makes it easy to see what is happening.

Arun
  • 116,683
  • 26
  • 284
  • 387
DaveTurek
  • 1,297
  • 7
  • 8
1

Here's some example code for one strategy (top 2 scorers):

set.seed(123)
df <- data.frame(team = LETTERS[1:2], player = replicate(8, paste0(sample(letters, 5, T), collapse = "")), score = sample(1:20, 8, T))
aggregate(score~team, data = df[order(-df$score), ], head, 2)
#   team score.1 score.2
# 1    A       9       5
# 2    B      10       9 
lukeA
  • 53,097
  • 5
  • 97
  • 100
0

Using the packages library(dplyr) and library(tidyr), along with the fake data generated by DaveTurek above, here is a step-by-step solution:

Generate fake data:

nba=data.frame(player=1:100,team=rep(LETTERS[1:10],each=10),ppg=1:100)

Select only the top 5 scorers per team by grouping, sorting, and slicing:

top_scorers <- nba %>% group_by(team) %>% arrange(-ppg) %>% slice(1:5)

Create a new variable called scoreRank that assigns their rank within the team, where 1 is highest scoring and 5 is 5th highest scoring:

top_scorers %<>% group_by(team) %>% mutate (scoreRank = rank(-ppg))

Drop the player name column and cast as a data frame (the latter necessary to a bug in dplyr):

top_scorers <- as.data.frame(top_scorers %>% select(-player))

Spread the data frame into the desired wide format, instead of its current long format:

result <- spread(top_scorers,scoreRank,ppg)
Sam Firke
  • 21,571
  • 9
  • 87
  • 105