2

I am new to R and working on baseball data from retrosheet. I am trying to download multiple files from my directory. For example, this ll object contains two names of TXT file "GL2001.TXT" and "GL2002.TXT". This is the script. This worked on my console.

    ll = list.files(pattern = "*.TXT")
ll

    mycols <- read_csv("game_log_header.csv") %>%
      names()


ll_data <- lapply(ll, read_delim,
                col_names = mycols,
                delim = ",",
                na = character())

ll_data_frame <- ldply (ll_data, data.frame)

But the group_by function didn't work on this data frame at all. Whenever I change group_by argument, R returns same result.

ll_data_frame %>%
  group_by(DayOfWeek) %>%
  summarize(R = sum(HomeRunsScore))

ll_data_frame %>%
  group_by(VisitingTeam) %>%
  summarize(R = sum(HomeRunsScore))

These two lines return same result.

      R
1 23047

How can't this code work? Do you have any way to work this code or other idea to fix it? Thank you in advance.

This is the a small part of my data.

ll_data_frame %>%
  select(Date, DayOfWeek, HomeTeam, VisitingTeam, 
         HomeRunsScore, VisitorRunsScored, HomeManagerName)


        Date DayOfWeek HomeTeam VisitingTeam HomeRunsScore VisitorRunsScored
1   20010401       Sun      TOR          TEX             8                 1
2   20010402       Mon      BAL          BOS             2                 1
3   20010402       Mon      CLE          CHA             4                 7
4   20010402       Mon      NYA          KCA             7                 3
5   20010402       Mon      SEA          OAK             5                 4
6   20010402       Mon      CHN          MON             4                 5
7   20010402       Mon      CIN          ATL             4                10
8   20010402       Mon      COL          SLN             8                 0
9   20010402       Mon      FLO          PHI             5                 6
10  20010402       Mon      LAN          MIL             1                 0
11  20010402       Mon      SFN          SDN             3                 2
Wingg23
  • 23
  • 3
  • Can you please provide a small subset of your data that we can work with to check your code? https://stackoverflow.com/help/minimal-reproducible-example – deschen May 21 '21 at 08:51
  • not really. Can you post the result of `dput(ll_data_frame)` here, or if the data set is too large do a subset of it, .e.g. `dput(head(ll_data_frame))` – deschen May 21 '21 at 10:22
  • @deschen I figured out. Thanks – Wingg23 May 21 '21 at 15:14

1 Answers1

0

This is because you are using dplyr and plyr packages simultaneously.
summarize function is masked from dplyr by plyr package.
Try this:

ll_data_frame %>%
  group_by(DayOfWeek) %>%
  dplyr::summarize(R = sum(HomeRunsScore))

ll_data_frame %>%
  group_by(VisitingTeam) %>%
  dplyr::summarize(R = sum(HomeRunsScore))
Behnam Hedayat
  • 837
  • 4
  • 18