I'm trying to show the distribution of salaries for a particular occupation. The BLS data is provided with respect to data by county. When I use the code below I almost get what I want but the problem is that the count being considered for the y axis is a count of the rows, which is count of counties.
So for a county with 10 employees and average income of 50k, that is being considered an equal count to a row that has 100 employees and average income of 80k. I know I could do it by expanding each county row by the number of employees, returning 10 rows of 50k and 100 rows of 80k, but I'm sure there is a better approach I just can't find it.
ggplot(Construction[which(Construction$avg_annual_pay>0),], aes(x=avg_annual_pay)) +
geom_histogram(binwidth = 5000, colour="black", fill="white") +
scale_x_continuous(labels = label_comma())
county | avg # employees | avg annual pay |
---|---|---|
1 | 34 | 47000 |
2 | 900 | 88000 |
3 | 85 | 40000 |
Tried making y=avg_employees but geom_histogram doesn't allow for use of both x and y arguments.
Edit:
qcewGetIndustryData <- function (year, qtr, industry) {
url <- "http://data.bls.gov/cew/data/api/YEAR/QTR/industry/INDUSTRY.csv"
url <- sub("YEAR", year, url, ignore.case=FALSE)
url <- sub("QTR", tolower(qtr), url, ignore.case=FALSE)
url <- sub("INDUSTRY", industry, url, ignore.case=FALSE)
read.csv(url, header = TRUE, sep = ",", quote="\"", dec=".", na.strings=" ", skip=0)
}
Construction <- qcewGetIndustryData("2015", "a", "1012")
Edit2:
> head(Construction[,1:5])
area_fips own_code industry_code agglvl_code size_code
1 01000 3 1012 53 0
2 01000 5 1012 53 0
3 01001 5 1012 73 0
4 01003 5 1012 73 0
5 01005 5 1012 73 0
6 01007 5 1012 73 0