0

I have a data frame of data of demographics in R

Name...Region...Gender

...A...........1.............F

...B...........2.............M

...C...........1.............F

...D...........1.............M

...E...........2.............M

I want to calculate gender ratio for every region. Output should look like:

Region ..........GenderRatio

....  1........................(0.67)

....  2........................(0.50)

This can be calculated using normal BODMAS usage. Is there any efficient way to calculate it in R?

iBug
  • 35,554
  • 7
  • 89
  • 134
  • 3
    Output does not match your input: Region 2 is 100% male in your example data. – neilfws Aug 29 '18 at 05:49
  • 1
    Hello and welcome on SO. You might want to read [this](https://stackoverflow.com/help/mcve) and [that](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to improve the way you ask questions. This simplifies things for the people willing to help you and might therefore also improve your life. ;-) – symbolrush Aug 29 '18 at 06:05

2 Answers2

0

You can use the dplyr library in R for all sorts of datamanipulation. See here to learn more about dplyr and other extremely useful R packages.

An example:

First I create some sample data. (I changed it a little bit to actually have a gender ratio that fits your output.)

df <- data.frame(name = c("A", "B", "C", "D", "E"),
                 region = c(1,2,1,1,2),
                 gender = c("F", "M", "F", "M", "F"))

Now we can calculate gender_ratio and summarise the data. The function mutate is used to create and calculate the new variable gender_ratio. The group_by and summarise functions to logically organise the data before calculation (in order it is calculated by region) and later to only output the summarised data.

library(dplyr)                 
df %>% group_by(region) %>% mutate(gender_ratio = sum(gender == "F")/length(gender)) %>% group_by(region, gender_ratio) %>% summarise()

Output is:

  region gender_ratio
   <dbl>        <dbl>
1      1        0.667
2      2        0.5

Hope this helps.

symbolrush
  • 7,123
  • 1
  • 39
  • 67
0

As a (base R) alternative, you can use by with prop.table(table(...)) to return a list of fractions for both male/female

with(df, by(df, Region, function(x) prop.table(table(x$Gender))))
#Region: 1
#
#        F         M
#0.6666667 0.3333333
#------------------------------------------------------------
#Region: 2
#
#F M
#0 1

Or to return only the male fraction

with(df, by(df, Region, function(x) prop.table(table(x$Gender))[2]))
#Region: 1
#[1] 0.3333333
#------------------------------------------------------------
#Region: 2
#[1] 1

Or to store male fraction and region in a data.frame simply stack the above result:

setNames(
    stack(with(df, by(df, Region, function(x) prop.table(table(x$Gender))[2]))),
    c("GenderRatio", "Region"))
#  GenderRatio Region
#1   0.3333333      1
#2   1.0000000      2
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68