0

I am beyond new to using R or anything in the computer science field. I have managed to utilize an R Script shared by my professor to gather data on earnings from the E-sport industry and see if there is gender inequality amongst top earners. I have managed to scrape the data from https://www.esportsearnings.com/ for the top 10 played games and all the prize money that has been awarded amongst those games. However, I have no way to sort that data by if the earner is "male", "female", "other", or "unable to tell". I am very new to R and all of this and can share the R code I used if need be. If anyone has a code or can guide me in the right direction, it would be greatly appreciated!

Hello again everyone! Thank you all for reaching back so soon. I will attach the two main scripts I have used below this from my professor. As far as the data I have collected, I am not sure of the best way to share it. It is essentially an Excel file with thousands of players ranked from highest earnings to lowest for the top 10 grossing games of all time. I will try to attach a screenshot of that.

enter image description here

R Script

    library(dplyr)
    library(lubridate)
    library(jsonlite)
    library(httr)
    
    
    
    DataREAD <- read.csv("allplayersStar2.csv", na.strings = "NA")
    PlayerID <- DataREAD$PlayerId
    PlayerID <- PlayerID[1:2115] #Must change this if repeating scrape due to interruption
    playerIDs <- as.numeric(gsub('[$,]', '', PlayerID))
    
    df <- data.frame(matrix(ncol = 25, nrow = 0))
    for(i in 1:length(playerIDs)) {
      #assigns the i player ID
      playerID <- playerIDs[[i]]
      APILINK <- paste0("http://api.esportsearnings.com/v0/LookupPlayerTournaments?apikey=b0e0da7e58c715f8618fbf2bb0f01920395531a048ccc4857274c6ccd7c157f9&playerid=", playerID,"&offset=0")
      Sys.sleep(1)
      jsonplayer <- APILINK %>% 
        httr::GET(config = httr::config(ssl_verifypeer = FALSE)) %>% 
        content(as = "text") %>%
        fromJSON()
      Sys.sleep(1)
      #this subset uses if to figure out if
      #there is more data left. If so it 
      #will collect another set of data
      if(nrow(jsonplayer) == 100) {
        APILINK <- paste0("http://api.esportsearnings.com/v0/LookupPlayerTournaments?apikey=b0e0da7e58c715f8618fbf2bb0f01920395531a048ccc4857274c6ccd7c157f9&playerid=", playerID,"&offset=100")
        Sys.sleep(1)
        jsonplayer2 <- APILINK %>% 
          httr::GET(config = httr::config(ssl_verifypeer = FALSE)) %>% 
          content(as = "text") %>%
          fromJSON()
        jsonplayer <- rbind(jsonplayer, jsonplayer2)
        if(nrow(jsonplayer) == 200) {
          Sys.sleep(1)
          APILINK <- paste0("http://api.esportsearnings.com/v0/LookupPlayerTournaments?apikey=b0e0da7e58c715f8618fbf2bb0f01920395531a048ccc4857274c6ccd7c157f9&playerid=", playerID,"&offset=200")
          Sys.sleep(1)
          jsonplayer2 <- APILINK %>% 
            httr::GET(config = httr::config(ssl_verifypeer = FALSE)) %>% 
            content(as = "text") %>%
            fromJSON()
          jsonplayer <- rbind(jsonplayer, jsonplayer2)
          if(nrow(jsonplayer) == 300) {
            Sys.sleep(1)
            APILINK <- paste0("http://api.esportsearnings.com/v0/LookupPlayerTournaments?apikey=b0e0da7e58c715f8618fbf2bb0f01920395531a048ccc4857274c6ccd7c157f9&playerid=", playerID,"&offset=300")
            Sys.sleep(1)
            jsonplayer2 <- APILINK %>% 
              httr::GET(config = httr::config(ssl_verifypeer = FALSE)) %>% 
              content(as = "text") %>%
              fromJSON()
            jsonplayer <- rbind(jsonplayer, jsonplayer2)
            Sys.sleep(1)
          }}}
      
      jsonplayer$Prize <- as.numeric(gsub('[$,]', '', jsonplayer$Prize))
      jsonplayer$ExchangeRate <- as.numeric(gsub('[$,]', '', jsonplayer$ExchangeRate))
      earnings <- mutate(jsonplayer, Earnings = Prize * ExchangeRate / TeamPlayers)
      #be careful to change this to the correct GameId!
      earnings <- filter(earnings, GameId == "151")
      
      playerdata <- c()
      playerdata <- append(playerdata, playerID)
      
      for(i in 1:24) {
        eYear <- i+1997
        enddate <- paste0(eYear,"-12-31")
        startdate <- paste0(eYear,"-01-01")
        earningsyear <- filter(earnings, EndDate < enddate & EndDate > startdate) 
        if(nrow(earningsyear) > 0) {
          yearearnings <- sum(earningsyear$Earnings)
          playerdata <- append(playerdata, yearearnings, after = length(1+i))
        } else {
          playerdata <- append(playerdata, NA, after = length(1+i))
        }
      }
      df <- rbind(df, playerdata)
    }
    
    
    x <- c("PlayerID", "X21", "X20", "X19", "X18",
           "X17", "X16", "X15", "X14",
           "X13", "X12", "X11", "X10",
           "X09", "X08", "X07", "X06",
           "X05", "X04", "X03", "X02",
           "X01", "X00", "X99", "X98")
    colnames(df) <- x
    
    write.csv(df, "playerearningsStar.csv")
    
    #Repeat using the following steps:
    #Change the number range in the third line of code
    #If you have scraped 70 players start from [71:end]
    #Change the number range in the write.csv
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • 1
    You could sort your data using `dplyr::arrange` or using the `order` function from base R, e.g. `arrange(mtcars, hp)` or `mtcars[order(mtcars$hp), ]` will sort mtcars according to hp. – stefan Mar 16 '22 at 20:47
  • 2
    Hi Ryan! This is probably considered off-topic here. A good suggestion for learning this: Start with an introductory tutorial on R. Or try to find a good term to google for. In this case, one could look for "R sort dataframe by column" or "R group count". – Jonathan Scholbach Mar 16 '22 at 20:47
  • 3
    Hi Ryan - welcome to SO. It is definitely worth editing your question to include your current script if you want concrete help. – Allan Cameron Mar 16 '22 at 20:47
  • 4
    (and please include sample data, see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info) – r2evans Mar 16 '22 at 20:56
  • 2
    To share your data in a reproducible format, try pasting the output of `dput(DataREAD)` which will give us the data and its structure. If your data is super long, you can give the first 20 (or however many) rows by `dput(head(DataREAD, 20))`. Also, in your question regarding sorting by gender, which variable is the gender variable? I dont see anything indicating anything other than salary and playerid? Good luck! – jpsmith Mar 16 '22 at 21:21
  • In order to be able to sort by gender, you first need to have that information in your data, as a column (for example, called `gender`) whose cells can have one of the values you mentioned: "male", "female", "other", or "unable to tell". Then you can use functions such as `dplyr::arrange` or `dplyr::filter`. From the screenshot and the code you posted, it looks like this info is not in your dataframe, so you would need to get it from another dataset online (that shares the same PlayerID as the dataset you have). – Andrea M Mar 16 '22 at 22:58

0 Answers0