R data.frame Aggregate data to calculate diversity ratio

Question

I have a data frame of data of demographics in R

Name...Region...Gender

...A...........1.............F

...B...........2.............M

...C...........1.............F

...D...........1.............M

...E...........2.............M

I want to calculate gender ratio for every region. Output should look like:

Region ..........GenderRatio

....  1........................(0.67)

....  2........................(0.50)

This can be calculated using normal BODMAS usage. Is there any efficient way to calculate it in R?

Output does not match your input: Region 2 is 100% male in your example data. — neilfws, Aug 29 '18 at 05:49
Hello and welcome on SO. You might want to read [this](https://stackoverflow.com/help/mcve) and [that](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to improve the way you ask questions. This simplifies things for the people willing to help you and might therefore also improve your life. ;-) — symbolrush, Aug 29 '18 at 06:05

symbolrush · Accepted Answer · 2018-08-29T06:01:20.280

You can use the dplyr library in R for all sorts of datamanipulation. See here to learn more about dplyr and other extremely useful R packages.

An example:

First I create some sample data. (I changed it a little bit to actually have a gender ratio that fits your output.)

df <- data.frame(name = c("A", "B", "C", "D", "E"),
                 region = c(1,2,1,1,2),
                 gender = c("F", "M", "F", "M", "F"))

Now we can calculate gender_ratio and summarise the data. The function mutate is used to create and calculate the new variable gender_ratio. The group_by and summarise functions to logically organise the data before calculation (in order it is calculated by region) and later to only output the summarised data.

library(dplyr)                 
df %>% group_by(region) %>% mutate(gender_ratio = sum(gender == "F")/length(gender)) %>% group_by(region, gender_ratio) %>% summarise()

Output is:

  region gender_ratio
   <dbl>        <dbl>
1      1        0.667
2      2        0.5

Hope this helps.

Thanks @symbolrush. This worked. – Anurag Jajoo Aug 29 '18 at 07:44 — Anurag Jajoo, Aug 29 '18 at 07:44

Maurits Evers · Answer 2 · 2018-08-29T06:11:38.983

As a (base R) alternative, you can use by with prop.table(table(...)) to return a list of fractions for both male/female

with(df, by(df, Region, function(x) prop.table(table(x$Gender))))
#Region: 1
#
#        F         M
#0.6666667 0.3333333
#------------------------------------------------------------
#Region: 2
#
#F M
#0 1

Or to return only the male fraction

with(df, by(df, Region, function(x) prop.table(table(x$Gender))[2]))
#Region: 1
#[1] 0.3333333
#------------------------------------------------------------
#Region: 2
#[1] 1

Or to store male fraction and region in a data.frame simply stack the above result:

setNames(
    stack(with(df, by(df, Region, function(x) prop.table(table(x$Gender))[2]))),
    c("GenderRatio", "Region"))
#  GenderRatio Region
#1   0.3333333      1
#2   1.0000000      2

R data.frame Aggregate data to calculate diversity ratio

2 Answers2