0

I want to create a chart, using ggplot, relating the variables "var_share" (in the y-axis) and "cbo" (in the x-axis), but by three time periods: 1996-2002, 2002-2008 and 2008-2012. Also, I want to calculate the "cbo" variable, by percentile. Here is my dataset:

ano   cbo ocupado quant total  share var_share
  <dbl> <dbl>   <dbl> <dbl> <dbl>  <dbl>     <dbl>
1  1996    20       1    32 39675 0.0807   -0.343 
2  1997    20       1    52 41481 0.125     0.554 
3  1998    20       1    34 40819 0.0833   -0.336 
4  1999    20       1    44 41792 0.105     0.264 
5  2001    20       1    57 49741 0.115     0.0884
6  1996    21       1   253 39675 0.638    -0.0326

You can download the full dataset here.

The result is almost like this:

enter image description here

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
Mateus Maciel
  • 151
  • 1
  • 1
  • 10
  • 1
    Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Jul 25 '20 at 22:54
  • Here is the link: https://www.dropbox.com/s/bcxhjtnrm5cle6l/ocupacoes.xlsx?dl=0 – Mateus Maciel Jul 25 '20 at 22:59
  • 1
    Are you sure you want cbo on x and var_share on y? Please add to your question what you have tried. Using the variables you mentioned a graph totally different to the one showed is obtained. – Duck Jul 26 '20 at 02:05
  • It would be the percentiles of the var_share. I think the results may be different from the figure, but I could even think about a way to it. So, I was not able to try anything. However, you can show what you got, in your plot. Maybe, it is right. – Mateus Maciel Jul 26 '20 at 02:18
  • *I could not even – Mateus Maciel Jul 26 '20 at 02:30
  • Welcome to StackOverflow! If you want to improve your question (for example: don't require others to download external files), here is some information on [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and how to give a [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). The MRE will make it easier for others to find and test a answer to your question. That way you can help others to help you! Also: Feel free to ignore all this if you think your question is fine and has good answers! – dario Jul 26 '20 at 07:56
  • @MateusMaciel Do I see it right, you want the data in three bins and those bins then making the x-axis? Meaning you want the variable *var_share* as an average? – MarBlo Jul 26 '20 at 12:15
  • @MarBlo, this is exactly what I want. – Mateus Maciel Jul 26 '20 at 12:21

1 Answers1

1

I believe this is what you are looking for. After reading your data in, a new variable called ano2 is build and after that a new DF which contains the bins called new you have defined.

The first plot then builds on this DF and uses stat_summary.

You also said something about the quantiles. I am not sure what exactly you have meant, but I grouped over this new variable and used technique from purrr to calculate the desired quantiles.

library(tidyverse)

df <- ocupacoes
df$ano2 <- readr::parse_date(paste0('01-01-', df$ano), '%d-%m-%Y')

ddf <- df %>%
  mutate(new = case_when(
    lubridate::year(ano2) %in% 1996:2002 ~ '96-02', 
    lubridate::year(ano2) %in% 2003:2008 ~ '02-08', 
    lubridate::year(ano2) %in% 2009:2012 ~ '08-12' 
  )) 

ggplot(ddf,aes(x = new, y = var_share, color = new,)) +
  stat_summary(fun = mean, colour = "red", size = 1) +
  scale_x_discrete(limits = c('96-02', '02-08', '08-12'))


# I think you were also looking for quantiles of cbo
ddf %>% 
  group_by(new) %>% 
  group_modify(~ {
    quantile(.x$cbo, probs = seq(0,1, by = .2)) %>%
      tibble::enframe(name = "prob", value = "quantile")
  }) %>% 
  ggplot(aes(x = prob, quantile, color = new, group = new)) +
    geom_line() +
    scale_x_discrete(limits = c('0%', '20%' ,
                                '40%', '60%',
                                '80%' , '100%')) 

MarBlo
  • 4,195
  • 1
  • 13
  • 27