0

I have a large survey dataset for Women and labour force. The answers are categorical values with different data labels. The dataset consists of 63,000 responses and 2000 different variables but I have attached a small snippet of the relevant variables below along with the data labels.

I need to construct a line graph for the Age profile of women in labour force by geographical location. I have the data for Age, Currently employed (with values 0 and 1 ; 0 being no and 1 being yes) and place of residence (values are 1 and 2; 1 being urban and 2 being rural) but I cannot figure out a way to combine the data and plot it since I am a beginner. I wish to take the proportion of women currently employed on the y-axis and age on the x-axis and get two lines one showing urban and one for rural.

I have attached an image of the kind of output I have in mind and the snippet of the variables. Since I couldn't add two separate images, I ave put them together. I understand that I can show urban-rural using facet_grid but I'm having trouble figuring out how to combine that data.

Image link

I would greatly appreciate any help.

user1110
  • 1
  • 2
  • Welcome to SO! To help us to help you could you please make your problem reproducible by sharing a sample of your data and the code you tried? Simply type `dput(head(NAME_OF_DATASET, 20))` into the console (which will give the first 20 rows of your data) and copy & paste the output starting with `structure(....` into your post. See also [how to make a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – stefan Aug 20 '20 at 07:43

1 Answers1

0

Welcome! Like @stefan said, it is easier if we see some of your data. So I generated some from your description.

library(tidyverse)
library(magrittr)

place = sample(c(1,2),63000, replace = TRUE) # 1 = Urban and 2 = Rural
employ = sample(c(0,1),63000, replace = TRUE) # 0 = Not Employed and 1 = Employed
age = sample(c(20:45), 63000, replace = TRUE) # Age


df = data.frame(place,employ, age)     
df %>% 
  group_by(age,place,employ) %>%  
 summarise(n = n()) %>% 
 mutate(prop = n/(n[1]+n[2])) %>% 
 filter(employ == 1) %>%
 mutate(newplace = case_when(place == 1 ~ "Urban", place == 2 ~ "Rural")) %>%  
 ggplot(., aes(x = age, y = prop,  color=newplace))+
 geom_line(aes(linetype = newplace))+
 scale_color_manual(values = c("blue", "red")) + #Or color of your choice
 labs(title = "Proportion of Women Employed:\n Comparing Urban and Rural Communities", y = "Proportion of Employed Women", color = "", 
 linetype = "")+ # Removed legend titles since they were redundany
 theme_classic()+
 theme(plot.title = element_text(hjust =.5), legend.position = "bottom")
Jonni
  • 804
  • 5
  • 16