I am working on analyzing my data set - which is race, occupation, and income data in the Philadelphia region.
I was hoping to use ggplot to do some various data visualizations, but I am having serious trouble even getting a single one to look normal. Every single plot looks incredibly crowded. I am doing something wrong. Maybe with ggplot, maybe with the factoring, but I am not sure.
This is my latest one, an attempted scatterplot.
ggplot(cps_data2, aes(x = INCWAGE_factor,
y = RACE_factor)) +
xlab('Individual Income') +
ylab('Race') +
geom_point()
That gives me, this:
Here's my data set information. (See example of how I factored my variables).
cps_data2<-cps_data2 %>%
mutate(INCWAGE_factor = as_factor(INCWAGE))
$ RACE_factor : Factor w/ 9 levels "White","Black/African American/Negro",..: 1 1 2 2 1 8 2 1 1 2 ...
..- attr(*, "label")= chr "Race [general version]"
$ OCC_factor : Factor w/ 429 levels "0","10","20",..: 42 302 1 22 254 291 1 112 418 1 ...
..- attr(*, "label")= chr "Occupation"
$ INCWAGE_factor: Factor w/ 654 levels "0","20","50",..: 521 283 1 529 328 311 1 1 283 1 ...
..- attr(*, "label")= chr "Wage and salary income"
$ SEX_factor : Factor w/ 2 levels "Male","Female": 2 1 2 1 2 1 1 2 1 2 ...
..- attr(*, "label")= chr "Sex"
$ CITY_factor : Factor w/ 1157 levels "Not in identifiable city (or size group)",..: 814 814 814 814 814 814 814 814 814 814 ...
..- attr(*, "label")= chr "City"
$ AGE_factor : Factor w/ 46 levels "Less than 1 year old",..: 14 12 37 18 35 14 39 41 37 36 ...
..- attr(*, "label")= chr "Age"