Have a dataset, df, with >10000 rows. The first 30 rows are:
>df
ms_estimate moso_estimate sig
1554 6.518196 0.8782018 NS
825 6.170754 5.1146423 ms and moso
709 4.164373 NA <NA>
13025 4.269822 5.7502859 ms and moso
2269 2.905754 0.7512660 NS
6714 3.401530 3.3315667 NS
14984 2.713234 NA <NA>
7423 1.935319 -0.5283304 NS
8453 2.123371 0.1680088 NS
906 NA 0.0382903 <NA>
14196 NA 0.0382903 <NA>
10033 2.280660 3.1261748 ms
16397 2.280660 3.1261748 ms
4647 2.159354 1.5308502 NS
5121 1.847211 0.1912870 NS
4245 1.478000 0.5877055 NS
4732 1.973196 3.0805554 moso
4733 1.973196 3.0805554 moso
14411 1.776247 0.9723628 ms
9760 1.740305 -2.3284208 ms and moso
12158 1.720102 0.9989511 NS
7741 1.758581 0.2117089 ms
14883 1.788952 NA <NA>
2315 1.832134 0.3518875 NS
4849 1.779664 -0.2311154 NS
7266 1.226592 0.5295427 NS
7189 1.716813 0.3342551 NS
253 1.667899 0.1715527 ms
13456 1.687443 0.4861952 ms
13518 1.542558 0.5361044 ms and moso
Want to make scatter plot with 'moso_estimate' vs 'ms_estimate', and color points according to whether being significant in ms_estimate, moso_estimate, in both or none (encoded by the 'sig' variable). To avoid overplotting (of 'sig' == "NS") I need to add data in layers according to the 'sig' variable (using subset() and .() from plyr package), and with alpha = 0.2. First layer is "NS", and the last should be "ms and moso". Works fine with the code below, except that I cannot control the legends when doing it this way. Is there a way to manually set the legend colors preferably with alpha = 1. Heres the code:
g <- ggplot(data = df)
g +
aes(x = ms_estimate, y = moso_estimate) +
geom_point(color = "grey", shape = 20, alpha=1, aes(fill = "NS")) +
geom_point(subset = .(sig == "ms"), color = "green", shape = 20, alpha = 0.2, aes(fill = "ms")) +
geom_point(subset = .(sig == "moso"), color = "blue", shape = 20, alpha = 0.2, aes(fill = "moso")) +
geom_point(subset = .(sig == "ms and moso"), color = "red", shape = 20, alpha = 1, aes(fill = "ms and moso")) +
xlim(-5, 5) + ylim(-5,5)