0

The advance of my code is (MWE) :

# https://www.kaggle.com/kaggle/kaggle-survey-2017/data

#### Analisis primario del dataset ####
response <- read.csv(file = "multipleChoiceResponses.csv",na.strings = "")

# seleccionamos solo algunas variables :
Variables <- c("GenderSelect","Country","Age","CurrentJobTitleSelect","MLToolNextYearSelect","LanguageRecommendationSelect","FormalEducation",
               "FirstTrainingSelect","EmployerIndustry")

# Mantenemos en memoria solo las variables seleecionadas : 
response <- response[,Variables]

# Por un tema de cantidades solo nos quedamos con M y F 
Response <- response[response$GenderSelect == "Male" | response$GenderSelect == "Female",]

# agrego una columna para los continenetes (continent) a donde pertenecen los paises (Country)
library(countrycode)
Response$continent <- countrycode(sourcevar = Response[, "Country"],
                                  origin = "country.name",
                                  destination = "continent")

# Convertimos a factor esta nueva variable
Response$continent <- as.factor(Response$continent)


# Eliminamos las filas con elementos NA 
Response <- Response[complete.cases(Response), ]

# Enumeramos todas las filas de manera adecuada
rownames(Response) <- 1:nrow(Response)


Response <- droplevels(Response)


bp_Continent <- barplot(table(Response$continent),
                        main = "Distribucion de DS por continentes",
                        ylim = c(0,3500)
)

# Add GenderSelect proportion by continent  in label argument ("BLABLABLA")
text(x = bp_Continent, y = table(Response$continent), label = "BLABLABLA", pos = 3, cex = 0.8, col = "red")

Basically, the script loads the data, chooses some of the variables, creates a new variable (continent), to finally clean the data. The next thing to do is create a barplot, placing the proportion of men and women on top of the bars

Imagen

What I am looking to do is change the "BLABLABLA" to the proportion between men and women (GenderSelect variable) by continent.

My question is not at all similar to : How to display the frequency at the top of each factor in a barplot in R

Because what interests me is the calculation of the proportion and the impression above the bars.

robintux
  • 93
  • 1
  • 9

2 Answers2

1

After reading Rui's answer,I thought of another solution .

first a function to calculate the ratio of men and women (by continent) and then sapply .

CreaEtiq <- function(conti){
  NumHContin <- dim(Response[Response$GenderSelect=="Male" & Response$continent==conti,])[1]
  NumMACntin <- dim(Response[Response$GenderSelect=="Female" & Response$continent==conti,])[1]
  return(round(NumHContin/NumMACntin,2))
}
EtiquetaBarPlot <- sapply(levels(Response$continent),CreaEtiq)

And to finish:

bp_Continent <- barplot(table(Response$continent),
                        main = "Distribucion de DS por continentes",
                        ylim = c(0,3500)
)
text(x = bp_Continent, y= table(Response$continent), 
     label = paste("H/M = ", EtiquetaBarPlot) ,
     pos = 3, cex = 0.8, col = "red")

obtaining the following graph

enter image description here

robintux
  • 93
  • 1
  • 9
0

The code below uses a made up data set, created in the end.
Once the proportions computed, all it is needed is to pass them function text, argument label.

Compute the proportions.

tbl <- table(Response$continent)
xt <- xtabs( ~ GenderSelect + continent, Response)
prop <- sweep(xt, 2, tbl, `/`)

Now plot the bars. The labels are the proportions of "Male".

bp_Continent <- barplot(tbl,
                        main = "Distribucion de DS por continentes",
                        ylim = c(0, 3500)
)
text(x = bp_Continent, y = tbl, 
     label = round(prop[2, ], 2), 
     pos = 3, cex = 0.8, col = "red")

enter image description here

Other labels could be, for instance, these:

sprintf("F: %1.2f/M: %1.2f", prop[1,], prop[2,])

Data creation code.

set.seed(1234)
n <- 5e3
GenderSelect <- c("Male", "Female")
GenderSelect <- sample(GenderSelect, n, TRUE)
continent <- c("Africa", "Americas", "Asia", "Europa", "Oceania")
continent <- sample(continent, n, TRUE, prob = c(1, 20, 14, 16, 2))
Response <- data.frame(GenderSelect, continent)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66