1

I am trying to find track the median quarterly wages for individuals based on the quarter they graduated, their demographic information, the program they graduated from, etc. For instance, if students graduated in 2012Q1 I would like to track their wages from 2012Q1 through 2020Q4 and be able to change the criteria quickly. Below is an example of the data I am dealing with.

People |grad.date|race|yrqtr|wage|program
PersonA|20121    |H   |20121|50  |Health
PersonA|20121    |H   |20122|50  |Health
PersonB|20122    |W   |20121|50  |Business
PersonB|20122    |W   |20122|75  |Business
PersonC|20131    |W   |20131|100 |Business
PersonC|20131    |W   |20132|50  |Business
PersonD|20121    |A   |20121|100 |Economics
PersonD|20121    |A   |20122|100 |Economics

Let's say I wanted to get the median wages for individuals that graduated in 2012Q1, the desired output would be:

People |grad.date|race|yrqtr  |wage|program  |Med.20121|Med.20122
PersonA|20121    |H   |20121  |50  |Health   |75       |75
PersonA|20121    |H   |20122  |50  |Health   |75       |75
PersonD|20121    |A   |20121  |100 |Economics|75       |75
PersonD|20121    |A   |20122  |100 |Economics|75       |75
  • Can you show the expected output – akrun Dec 19 '22 at 16:48
  • I have edited with the desired output, however, I would like to add stipulations if I just wanted to look at Health, for example. – Connor Hill Dec 19 '22 at 17:14
  • 1
    Have a look at the FAQ [Calculate group mean, sum, or other summary stats. and assign column to original data](https://stackoverflow.com/q/6053620/903061). You can use `median` as your summary function. – Gregor Thomas Dec 19 '22 at 17:29

1 Answers1

0

Something like this?
After computing the medians you can filter the output below by program == "Health", for instance.

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

df1 %>%
  filter(grad.date == 20121) %>%
  group_by(People, grad.date) %>%
  mutate(wage = median(wage)) %>%
  left_join(
    df1 %>%
      group_by(grad.date) %>%
      mutate(Med = median(wage))
  ) %>% 
  pivot_wider(names_from = yrqtr, values_from = Med,
              names_glue = "{.value}.{yrqtr}")
#> Joining, by = c("People", "grad.date", "race", "yrqtr", "wage", "program")
#> # A tibble: 2 × 7
#> # Groups:   People, grad.date [2]
#>   People  grad.date race   wage program   Med.20121 Med.20122
#>   <chr>       <dbl> <chr> <dbl> <chr>         <dbl>     <dbl>
#> 1 PersonA     20121 H        50 Health           75        75
#> 2 PersonD     20121 A       100 Economics        75        75

Created on 2022-12-19 with reprex v2.0.2


Data

df1 <- "People |grad.date|race|yrqtr|wage|program
PersonA|20121    |H   |20121|50  |Health
PersonA|20121    |H   |20122|50  |Health
PersonB|20122    |W   |20121|50  |Business
PersonB|20122    |W   |20122|75  |Business
PersonC|20131    |W   |20131|100 |Business
PersonC|20131    |W   |20132|50  |Business
PersonD|20121    |A   |20121|100 |Economics
PersonD|20121    |A   |20122|100 |Economics"
df1 <- read.table(textConnection(df1), header = TRUE, sep = "|")
df1$race <- trimws(df1$race)

Created on 2022-12-19 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66