0

I want to clean up my data but I'm quite new to R.

my code:

#library

library(dplyr)
library(shiny)
library(shinythemes)
library(ggplot2)

#Source
dataset <- read.csv("Wagegap.csv")

SFWage <- dataset %>% select(gender,JobTitle,BasePay,Year,)

clean <- na.omit(SFWage)


#UI
ui <- fluidPage(
  
  theme = shinytheme("united"),
  
  navbarPage("San Fransisco Wages",

             tabPanel("Barplot",
                    
                      mainPanel(
                        plotOutput("barplot")
                        
                      )) ,
             tabPanel("Table", 
                      mainPanel(
                        dataTableOutput("table")
                      ))                    
  )
  
)                                 
 
#server
server <- function(input, output){
  
  output$barplot <- renderPlot({
    
    ggplot(clean, aes(x = JobTitle, y = BasePay  ))+
    geom_bar(stat="Identity", width = 0.3, fill="orange")+
    labs(x= "Jobs", y = "Wage", title = "Wage per job")
   
})
   
  output$table <- renderDataTable({
    clean
    
  })
}
  shinyApp(ui = ui, server = server)

I get this output for the table: Table Output

What I would like is the Jobtitle to be bundled and only seperated by the gender, and showing the average of the BasePay.

M - account clerk - averageBasePay - Year

F - account clerk - averageBasePay - Year

and so on for every job.

This data will be used to compare the wagegap between genders for every job given in the dataset.

P.S.

If someone also could tell me why the na.omit didnt work to clean up the empty genders, that would be amazing :)

T R
  • 5
  • 4
  • I guess `na.omit()` didn't "work" because missing values are empty string "" and no NA. Try `clean <- SFWage %>% filter(gender != "")` – HubertL Nov 14 '20 at 01:27
  • Possible duplicate/Relevant https://stackoverflow.com/questions/11562656/calculate-the-mean-by-group – Ronak Shah Nov 14 '20 at 01:31

1 Answers1

0

You could use group_by() and summarise()

SFWage <- dataset %>% 
  group_by(gender,JobTitle, Year) %>%
  summarise(averageBasePay = mean(BasePay, na.rm=TRUE) %>%
  select(gender, JobTitle, averageBasePay, Year)
HubertL
  • 19,246
  • 3
  • 32
  • 51
  • Hey the filter function and the group by worked perfect, however I can't calculate the mean because my BasePay = character so it returns NA. I have tried as.numeric(BasePay) however It won't make it numeric. – T R Nov 14 '20 at 01:38
  • probably because they have "." decimal seps while your locale uses ",". Try `as.numeric(gsub(".", ",", BasePay)) – HubertL Nov 14 '20 at 01:43
  • It won't recognize the BasePay column. I have tried pasting the code within the SFWage <- using the %>% – T R Nov 14 '20 at 01:55
  • `summarise(averageBasePay = mean(as.numeric(gsub(".", ",", BasePay)), na.rm=TRUE) %>%` – HubertL Nov 15 '20 at 04:56