2

I have a listing of New York Mets baseball players from the Lahman database in alphabetical order. For each player are the years he played in ascending order. I need to extract for each player just the data for the first year he played and put all the first rows into a new data frame.

On my Mac in RStudio I have gotten to the point where the data I need is grouped and ordered. Here is a sample.

playerID,yearID,G,AB,R,H
aceveju01,1997,25,6,0,0
acostma01,2010,41,0,0,0
acostma01,2011,44,0,0,0
acostma01,2012,45,0,0,0
adkinjo01,2007,1,0,0,0
agbaybe01,1998,11,15,1,2
agbaybe01,1999,101,276,42,79
agbaybe01,2000,119,350,59,101
agbaybe01,2001,91,296,28,82
ageeto01,1968,132,368,30,80
ageeto01,1969,149,565,97,153
ageeto01,1970,153,636,107,182
ageeto01,1971,113,425,58,121
ageeto01,1972,114,422,52,96
aguilch01,2008,8,12,0,2

For testing purposes, I started with this code instead of with piping. That is as far as I was able to advance.

Lahman_batting18 <- read.csv('Batting-copy.csv', header = TRUE, stringsAsFactors=FALSE)
Lahman_batting18s <- select(Lahman_batting18,playerID:SO)
Lahman_batting18f <- filter(Lahman_batting18s,teamID == 'NYN')
Lahman_batting18fa <- arrange(Lahman_batting18f, playerID, yearID)

Desired output:

playerID,yearID,G,AB,R,H
aceveju01,1997,25,6,0,0
acostma01,2010,41,0,0,0
adkinjo01,2007,1,0,0,0
agbaybe01,1998,11,15,1,2
ageeto01,1968,132,368,30,80
aguilch01,2008,8,12,0,2

Thanks for your help!

massisenergy
  • 1,764
  • 3
  • 14
  • 25
Metsfan
  • 510
  • 2
  • 8

1 Answers1

2

d.b used base r, while I'm more fond of dplyr & pipes.

Lahman_batting18 %>% group_by(playerID) %>% arrange(playerID, yearID) %>% 
filter(yearID == min(yearID))

Filtering only the year where it's minimum. I hope this is what you want? Output that I get using your exemplary data:

# A tibble: 6 x 6
# Groups:   playerID [6]
  playerID  yearID     G    AB     R     H
  <fct>      <int> <int> <int> <int> <int>
1 aceveju01   1997    25     6     0     0
2 acostma01   2010    41     0     0     0
3 adkinjo01   2007     1     0     0     0
4 agbaybe01   1998    11    15     1     2
5 ageeto01    1968   132   368    30    80
6 aguilch01   2008     8    12     0     2
massisenergy
  • 1,764
  • 3
  • 14
  • 25
  • In your output, the playerID is not in alphabetical order which is how I would prefer it to be. Yours is ordered by yearID. – Metsfan Jul 21 '19 at 22:36
  • Oh sorry, corrected it. Now the output is exact for the code I mentioned... – massisenergy Jul 21 '19 at 22:56
  • @Ronak Shah Today, I checked the total number of first rows using these 2 solutions: (1) Season1_all <- Lahman_batting18 %>% group_by(playerID) %>% arrange(playerID, yearID) %>% filter(yearID == min(yearID)) (2) Season1_all2 <- Lahman_batting18 %>% group_by(playerID) %>% slice(1) I expected the total number of rows in the solutions to be the same, but they are not. Solution 1 has 19,999 rows; whereas, Solution 2 has 19,428 rows. Further, when I ran "distinct(Lahman_batting18, playerID)" I also got 19,428 rows. Why am I getting the different numbers? Which solution is giving correct total? – Metsfan Jul 22 '19 at 17:52