-2

anyone could help me to change the color of points (using geom_point) in R?

I need to set, for example, different colors for points above and below 3 standard deviation of the dataset.

The plot is following:

This is the plot in which i need to change the color of points

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Brietzke
  • 1
  • 1
  • 4
    It's very difficult to know how we can help you without seeing your existing code and a sample of your data. – Allan Cameron Feb 22 '22 at 18:33
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. What exactly do you mean by "standard deviation of the dataset". Is one of these variables you are plotting a standard deviation? Is that line you are plotting some sort of fit line and want to indicate points outside the prediction window? – MrFlick Feb 22 '22 at 18:48
  • Please provide enough code so others can better understand or reproduce the problem. – Community Feb 22 '22 at 22:56
  • Thank you for the comments. Sorry, it was my first time writing here, so I will give a reproducible example. – Brietzke Mar 01 '22 at 14:33

1 Answers1

0

Given you didn't give a minimal example, I've used iris dataset to show you how I would do it.

Package

library(dplyr)
library(ggplot2)

Initially, let's find the mean and SD.

# Find mean and SD
iris %>% 
  summarise(mean_Sepal.Width = mean(Sepal.Width),
            sd_Sepal.Width = sd(Sepal.Width))

>  mean_Sepal.Width sd_Sepal.Width
>1         3.057333      0.4358663

Now, I would create a new column with mutate and code if each observation is above or below 3SD WITH case_when.

Finally, put color = color_group (the new variable we created) inside aes(), and you are good to go.

Solution 1

iris %>% 
  mutate(color_group = case_when(Sepal.Width >= 3.06+3*.436 ~ "above 3SD",
                                   Sepal.Width <= 3.06-3*.436 ~ "below 3SD",
                                   T ~ "around the mean")) %>% 
  ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = color_group)) + 
  geom_point() 

Solution 2

To automatize the process you can just calculate the mean and SD on the fly inside mutate. Both solutions give the same output.

iris %>% 
  mutate(color_group = case_when(Sepal.Width >= mean(Sepal.Width)+3*sd(Sepal.Width) ~ "above 3SD",
                                   Sepal.Width <= mean(Sepal.Width)-3*sd(Sepal.Width) ~ "below 3SD",
                                   T ~ "around the mean")) %>% 
  as_tibble() %>% 
  ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = color_group)) + 
  geom_point() 

output

In this case, we don't have many outliers, so we don't have many colors. But if we have more outliers, you will see the different colors there.

enter image description here

Ruam Pimentel
  • 1,288
  • 4
  • 16