0

I'm pretty new to the world of R, so please be patient with me ;-)

For the last two weeks I've been playing with an old dataset of mine, trying to figure out some stuff in R. What you need to know about my data to help me here, is that I got data about various people regarding their age, their education, their sex and their money spent on a trip. So I'd like to generate two kinds of plots with this data: First of all I'd like to create a scatterplott with the age on the x-axis and the money spent on the y-axis. Then I'd like to color-code the dots referring to the school type/their education.

This kinda works with this code:

scatter <- ggplot(spending.analysis, aes(age, money), na.action=na.exclude)
scatter + 
        geom_point(aes(color = school), alpha = 0.7) +
        geom_smooth(method = "lm", color = "dark blue", alpha = 0.1, fill = "blue")

But unfortunately, it uses the default colors from R which I don't really like too much, so I'd like to tell R to use color A on school-type a, color B to school-type b etc. So far I couldn't make that happen.

The next approach to achieve this was this way, which also failed miserably...

scatter.ns <- subset(spending.analysis, school == "noch in Schulausbildung")
scatter.oa <- subset(spending.analysis, school == "ohne Abschluss")
scatter.hs <- subset(spending.analysis, school == "Hauptschule")
scatter.rs <- subset(spending.analysis, school == "Realschule")
scatter.gym <- subset(spending.analysis, school == "Gymnasium")


scatter2 <- ggplot(scatter.hs, scatter.rs, scatter.gym) +
            geom_point(aes())

My second idea - it's not about real analysis, it's just playing around, trying to learn and understand R - was to facet the plot, so that I'd get every school type in a scatterplot on its own.

scatter <- ggplot(spending.analysis, aes(age, money), na.action=na.exclude)
scatter + 
    geom_point(aes(color = school), alpha = 0.7) +
        geom_smooth(method = "lm", color = "dark blue", alpha = 0.1, fill = "blue") +
        facet_grid(. ~ school)

Again, this code kinda works, but I still don't know how to assign each plot/school type a color of my preference. And for some weird reason there's a scatterplot for the NAs as well, which is pretty confusing to me. Is there a way to exclude data from being plotted?! Basically I think kicking out the first and the last plot would make sense. (see http://de.tinypic.com/r/2hhkp5l/8 )

Sorry for the long post, but it's really hard as a beginner and I really really tried to figure it out by myself.

Thank you SO much for your advice - and please keep it understandable for a beginner ;-)

RJW
  • 13
  • 1
  • 1
  • 5

2 Answers2

0

There are multiuple ways you can achieve your goal. First, if you consider generating separate scatter plots and then mergining them you can use the multiplot function. You would simply have to generate the graphs you want, with all the settings, and then merge them.

As a second approach, you can at the GGally, ggpairs will enable you to produce different scatter plot matrices if this is what you are after.

Third, you can play with adding groupColors=c('aquamarine3','chartreuse1','goldenrod1') to your scatter plot definition.

Finally, you may settle down on using one of the ggplot themes, where you could define whole colour palette and other gadgets. With respect to the second part of your question concerning the NAs, it would be better if you could share some data but on principle you should be able to try something on the lines:

ggplot(na.omit(your.data.frame[, c("variable1", "variable2")]), aes(x=variable1,y=variable1))

and then progress with your scatter plot definition.

Konrad
  • 17,740
  • 16
  • 106
  • 167
  • Wow, that's a very profound answer! Thank you very much for taking your time to explain it in more detail and on a level even a beginner can understand things! Great! I'll do some research on the multiplot-function as this sounds very promising, just like the groupColors (I've never heard of them before!)! The themes sound good, too, but maybe that's a little bit too much for me at this level. But thanks for the hint nevertheless! I thought I took care about the NAs in my first line of the last code-block of my original post. When do I use omit and when exlcude? Thanks again so much!!! – RJW Feb 13 '15 at 16:48
  • Pardon? What do you mean? – RJW Feb 13 '15 at 16:53
0

As I was asked to make my problem/help more understandable here's the code that's working so far. For some unknown reason it's plotting the NAs, too. But as I'm still learning and not working on a "real project" with that data, that's okay and just a minor issue.

So here's my code:

group.colors <- c("noch in Schulausbildung" = "#D11141" , "ohne Abschluss"  = "#00B159", "Hauptschule" = "#00AEDB", "Realschule" = "#F37735", "Gymnasium" = "#FFC425")
scatter <- ggplot(na.action=na.exclude, spending.analysis, aes(age, money))
scatter + 
        geom_point(aes(color = school), alpha = 0.7) +
        geom_smooth( method = "lm", color = "dark blue", alpha = 0.05, fill = "blue", na.action = na.exclude) +
        facet_grid(. ~ school) +
        theme_bw() +
        scale_color_manual(values = group.colors)

I hope this helps others with their problems :-)

RJW
  • 13
  • 1
  • 1
  • 5