1

Have a dataset, df, with >10000 rows. The first 30 rows are:

>df
      ms_estimate moso_estimate         sig
1554     6.518196     0.8782018          NS
825      6.170754     5.1146423 ms and moso
709      4.164373            NA        <NA>
13025    4.269822     5.7502859 ms and moso
2269     2.905754     0.7512660          NS
6714     3.401530     3.3315667          NS
14984    2.713234            NA        <NA>
7423     1.935319    -0.5283304          NS
8453     2.123371     0.1680088          NS
906            NA     0.0382903        <NA>
14196          NA     0.0382903        <NA>
10033    2.280660     3.1261748          ms
16397    2.280660     3.1261748          ms
4647     2.159354     1.5308502          NS
5121     1.847211     0.1912870          NS
4245     1.478000     0.5877055          NS
4732     1.973196     3.0805554        moso
4733     1.973196     3.0805554        moso
14411    1.776247     0.9723628          ms
9760     1.740305    -2.3284208 ms and moso
12158    1.720102     0.9989511          NS
7741     1.758581     0.2117089          ms
14883    1.788952            NA        <NA>
2315     1.832134     0.3518875          NS
4849     1.779664    -0.2311154          NS
7266     1.226592     0.5295427          NS
7189     1.716813     0.3342551          NS
253      1.667899     0.1715527          ms
13456    1.687443     0.4861952          ms
13518    1.542558     0.5361044 ms and moso

Want to make scatter plot with 'moso_estimate' vs 'ms_estimate', and color points according to whether being significant in ms_estimate, moso_estimate, in both or none (encoded by the 'sig' variable). To avoid overplotting (of 'sig' == "NS") I need to add data in layers according to the 'sig' variable (using subset() and .() from plyr package), and with alpha = 0.2. First layer is "NS", and the last should be "ms and moso". Works fine with the code below, except that I cannot control the legends when doing it this way. Is there a way to manually set the legend colors preferably with alpha = 1. Heres the code:

g <- ggplot(data = df)
g +
     aes(x = ms_estimate, y = moso_estimate) +
     geom_point(color = "grey", shape = 20, alpha=1, aes(fill = "NS")) +
     geom_point(subset = .(sig == "ms"), color = "green", shape = 20, alpha = 0.2, aes(fill = "ms")) +
     geom_point(subset = .(sig == "moso"), color = "blue", shape = 20, alpha = 0.2, aes(fill = "moso")) +
     geom_point(subset = .(sig == "ms and moso"), color = "red", shape = 20, alpha = 1, aes(fill = "ms and moso")) +
     xlim(-5, 5) + ylim(-5,5)

Figure

Figure whole data with code from Matthew Plourde Figure whole data with own code - withput legends

user3375672
  • 3,728
  • 9
  • 41
  • 70

1 Answers1

1

enter image description hereYou want to do something like this instead:

ggplot(data=df[complete.cases(df),]) +
    aes(x=ms_estimate, y=moso_estimate, color=sig, alpha=sig) +
    geom_point(shape=20) +
    scale_colour_manual(values=c(NS='grey', ms='green', moso='blue', `ms and moso`='red')) +
    scale_alpha_manual(values=c(NS=1, ms=.2, moso=.2, `ms and moso`=1))
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
  • The suggested code doesnt take care of overplotting - see the figure above with the whole data set (>10000 points) using the suggested code. Thats why I need to put the points on in layers, see figure above with all data but without the legends (which is what I need to get work). I cannot see how the suggested thread answers my problems, sorry. – user3375672 Aug 06 '14 at 12:59
  • Just order your rows according to the order you want them plotted ... – Matthew Plourde Aug 06 '14 at 13:11
  • 2
    e.g., `df <- df[order(match(df$sig, c('', 'NS', 'ms', 'moso', 'ms and moso'))), ]` – Matthew Plourde Aug 06 '14 at 13:12
  • Perfect, it works nicely! – user3375672 Aug 06 '14 at 20:15