How to use count valued rows to create histogram with ggplot

Question

I'm trying to show the distribution of salaries for a particular occupation. The BLS data is provided with respect to data by county. When I use the code below I almost get what I want but the problem is that the count being considered for the y axis is a count of the rows, which is count of counties.

So for a county with 10 employees and average income of 50k, that is being considered an equal count to a row that has 100 employees and average income of 80k. I know I could do it by expanding each county row by the number of employees, returning 10 rows of 50k and 100 rows of 80k, but I'm sure there is a better approach I just can't find it.

ggplot(Construction[which(Construction$avg_annual_pay>0),], aes(x=avg_annual_pay)) + 
  geom_histogram(binwidth = 5000, colour="black", fill="white") + 
  scale_x_continuous(labels = label_comma())

county	avg # employees	avg annual pay
1	34	47000
2	900	88000
3	85	40000

Tried making y=avg_employees but geom_histogram doesn't allow for use of both x and y arguments.

Edit:

        qcewGetIndustryData <- function (year, qtr, industry) {
      url <- "http://data.bls.gov/cew/data/api/YEAR/QTR/industry/INDUSTRY.csv"
      url <- sub("YEAR", year, url, ignore.case=FALSE)
      url <- sub("QTR", tolower(qtr), url, ignore.case=FALSE)
      url <- sub("INDUSTRY", industry, url, ignore.case=FALSE)
      read.csv(url, header = TRUE, sep = ",", quote="\"", dec=".", na.strings=" ", skip=0)
    }
    
    Construction <- qcewGetIndustryData("2015", "a", "1012")

Edit2:

> head(Construction[,1:5])
  area_fips own_code industry_code agglvl_code size_code
1     01000        3          1012          53         0
2     01000        5          1012          53         0
3     01001        5          1012          73         0
4     01003        5          1012          73         0
5     01005        5          1012          73         0
6     01007        5          1012          73         0

Can you provide `dput(Construction)` and any code you've tried out so far, even if you think it's clunky? Is the table you provided similar to the data you currently have or is the table indicative of what you want your final output to look like? — jrcalabrese, Feb 15 '23 at 18:12
I think you just want a weighted histogram: https://stackoverflow.com/questions/19841204/create-a-histogram-for-weighted-values — MrFlick, Feb 15 '23 at 18:23
@MrFlick yes and no, I had the same thought at first, but applying weights would still be plotting the distribution of the per county when I want the distribution of the salaries across the population as a whole. It sounds like it might work if I figured out a way to assign weights to each county proportional to the counties employee population to the total population. But doing that sounds like it's overcomplicating the problem, and I am not 100% confident the outcome would be truly representative of what I actually want. Thank you for the link tho — creetz, Feb 16 '23 at 23:56
@jrcalabrese I added the code that creates the Construction data set, although I'm not familiar with dput and the output when I run it on Construction contains so much text it extends beyond what fits in the console window. so I'm not sure how to share it with you. The table I provided is an example of the current dataset. — creetz, Feb 17 '23 at 10:03

How to use count valued rows to create histogram with ggplot

0 Answers0