I'm pretty new to the world of R, so please be patient with me ;-)
For the last two weeks I've been playing with an old dataset of mine, trying to figure out some stuff in R. What you need to know about my data to help me here, is that I got data about various people regarding their age, their education, their sex and their money spent on a trip. So I'd like to generate two kinds of plots with this data: First of all I'd like to create a scatterplott with the age on the x-axis and the money spent on the y-axis. Then I'd like to color-code the dots referring to the school type/their education.
This kinda works with this code:
scatter <- ggplot(spending.analysis, aes(age, money), na.action=na.exclude)
scatter +
geom_point(aes(color = school), alpha = 0.7) +
geom_smooth(method = "lm", color = "dark blue", alpha = 0.1, fill = "blue")
But unfortunately, it uses the default colors from R which I don't really like too much, so I'd like to tell R to use color A on school-type a, color B to school-type b etc. So far I couldn't make that happen.
The next approach to achieve this was this way, which also failed miserably...
scatter.ns <- subset(spending.analysis, school == "noch in Schulausbildung")
scatter.oa <- subset(spending.analysis, school == "ohne Abschluss")
scatter.hs <- subset(spending.analysis, school == "Hauptschule")
scatter.rs <- subset(spending.analysis, school == "Realschule")
scatter.gym <- subset(spending.analysis, school == "Gymnasium")
scatter2 <- ggplot(scatter.hs, scatter.rs, scatter.gym) +
geom_point(aes())
My second idea - it's not about real analysis, it's just playing around, trying to learn and understand R - was to facet the plot, so that I'd get every school type in a scatterplot on its own.
scatter <- ggplot(spending.analysis, aes(age, money), na.action=na.exclude)
scatter +
geom_point(aes(color = school), alpha = 0.7) +
geom_smooth(method = "lm", color = "dark blue", alpha = 0.1, fill = "blue") +
facet_grid(. ~ school)
Again, this code kinda works, but I still don't know how to assign each plot/school type a color of my preference. And for some weird reason there's a scatterplot for the NAs as well, which is pretty confusing to me. Is there a way to exclude data from being plotted?! Basically I think kicking out the first and the last plot would make sense. (see http://de.tinypic.com/r/2hhkp5l/8 )
Sorry for the long post, but it's really hard as a beginner and I really really tried to figure it out by myself.
Thank you SO much for your advice - and please keep it understandable for a beginner ;-)