add legend and smoothing to a graph

Question

I am going to plot these curves with the legend, however, the legend does not appear. Does anyone know how I can bring it ... also is it possible to make the plots neater and smoother? (the data for each plot is 165, ... is it possible to plot the average of each 16 data (i.e. we do not have 165 data anymore, we will have 10 data? here is the code:

ggplot() + 
  geom_ribbon(data=file1, aes(x = No., ymax = A1_10 + S1_10, ymin = A1_10 - S1_10),alpha = 0.20, fill=cl[1])+
  geom_line(data = file1, aes(x = No., y = A1_10), colour="red")+
  geom_ribbon(data=file1, aes(x = No., ymax = A2_10 + S2_10, ymin = A2_10 - S2_10),alpha = 0.20, fill=cl[3])+
  geom_line(data = file1, aes(x = No., y = A2_10), colour=cl[3])+
  geom_ribbon(data=file1, aes(x = No., ymax = A3_10 + S3_10, ymin = A3_10 - S3_10),alpha = 0.20, fill=cl[11])+
  geom_line(data = file1, aes(x = No., y = A3_10), colour=cl[11])+
  geom_ribbon(data=file1, aes(x = No., ymax = A5_10 + S5_10, ymin = A5_10 - S5_10),alpha = 0.20, fill=cl[8])+
  geom_line(data = file1, aes(x = No., y = A5_10), colour=cl[8])+
  geom_ribbon(data=file1, aes(x = No., ymax = A6_10 + S6_10, ymin = A6_10 - S6_10),alpha = 0.20, fill=cl[12])+
  geom_line(data = file1, aes(x = No., y = A6_10), colour=cl[8])+
  geom_ribbon(data=file1, aes(x = No., ymax = A7_10 + S7_10, ymin = A7_10 - S7_10),alpha = 0.20, fill=cl[18])+
  geom_line(data = file1, aes(x = No., y = A7_10), colour=cl[8])+
  xlab("Number")+ylab("Value")+
  theme(legend.position="bottom")

score 0 · Answer 1 · answered Feb 17 '20 at 20:49

In order to get a legend, you can either add show.legend = TRUE in each geom_line but the result should be quite messy.

A better way of using ggplot is to reshape your data into a longer format. As, you did not provide reproducible example of your dataset in your question, I create a dummy example (provided below) with 7 columns, one containing number from 1 to 165 and 3 columns with a value and 3 column with a (fake) standard deviation.

For reshaping your dataframe, I used pivot_longer and pivot_wider function from tidyr package in order to obtain one column for "Number", one column for "y" values, one column for "s" values and one column for categorical values:

library(tidyr)
library(dplyr)
df %>% pivot_longer(-No, names_to = "var", values_to = "val") %>%
  mutate(Col = sub("\\d","",var), var = sub("\\w","",var)) %>%
  pivot_wider(names_from = Col, values_from = val)

# A tibble: 495 x 4
      No var        y      s
   <int> <chr>  <dbl>  <dbl>
 1     1 1      2.28   1.14 
 2     1 2      5.02   2.01 
 3     1 3      2.14   0.427
 4     2 1      2.57   1.28 
 5     2 2      5.06   2.02 
 6     2 3      2.07   0.413
 7     3 1     -0.201 -0.100
 8     3 2      4.42   1.77 
 9     3 3      1.32   0.264
10     4 1      0.562  0.281
# … with 485 more rows

Now, you can pass that and add the code for plotting using ggplot2:

library(tidyr)
library(dplyr)
library(ggplot2)
df %>% pivot_longer(-No, names_to = "var", values_to = "val") %>%
  mutate(Col = sub("\\d","",var), var = sub("\\w","",var)) %>%
  pivot_wider(names_from = Col, values_from = val) %>%
  ggplot(aes(x = No, y = y, color = as.factor(var)))+
  geom_line()+
  geom_ribbon(aes(ymin = y-s, ymax= y+s, fill = as.factor(var)), alpha = 0.2)+
  scale_color_manual(values = c("red","blue","orange"))+
  scale_fill_manual(values = c("red","blue","orange"))+
  theme(legend.position = "bottom")

Now, you can see, that you don't need to repeat geom_line and geom_ribbon for each initial variable and you get your legend that you can further personalize.

Regarding the second question, we can average your dataframe for each 16 data by using cut function. However, as you have 165 number, you will get 11 groups and not 10. Based the previous code, you can do:

library(tidyr)
library(dplyr)
df %>% mutate(Group = cut(No, breaks = c(seq(1,165,by  = 16),165), include.lowest = TRUE, label = 1:11)) %>%
  pivot_longer(-c(No, Group), names_to = "var", values_to = "val") %>%
  mutate(Col = sub("\\d","",var), var = sub("\\w","",var)) %>%
  group_by(Group, var, Col) %>%
  summarise(Mean = mean(val)) %>%
  pivot_wider(names_from = Col, values_from = Mean)

# A tibble: 33 x 4
# Groups:   Group, var [33]
   Group var       s     y
   <fct> <chr> <dbl> <dbl>
 1 1     1     0.384 0.768
 2 1     2     1.54  3.86 
 3 1     3     0.432 2.16 
 4 2     1     0.331 0.662
 5 2     2     1.57  3.94 
 6 2     3     0.454 2.27 
 7 3     1     0.639 1.28 
 8 3     2     1.71  4.26 
 9 3     3     0.403 2.01 
10 4     1     0.355 0.710
# … with 23 more rows

And similarly to get the plot, you can do:

library(tidyr)
library(dplyr)
library(ggplot2)
df %>% mutate(Group = cut(No, breaks = c(seq(1,165,by  = 16),165), include.lowest = TRUE, label  = 1:11)) %>%
  pivot_longer(-c(No, Group), names_to = "var", values_to = "val") %>%
  mutate(Col = sub("\\d","",var), var = sub("\\w","",var)) %>%
  group_by(Group, var, Col) %>%
  summarise(Mean = mean(val)) %>%
  pivot_wider(names_from = Col, values_from = Mean) %>% 
  ggplot(.,aes(x = as.numeric(Group), y = y, color = as.factor(var), fill = as.factor(var)))+
  geom_line()+
  geom_ribbon(alpha = 0.2, aes(ymin = y-s, ymax = y+s))+
  scale_x_continuous(breaks = 1:11)+
  scale_color_manual(values = c("red","blue","orange"))+
  scale_fill_manual(values = c("red","blue","orange"))+
  theme(legend.position = "bottom")

Does it answer your question ?

If not, please provide a reproducible example of your dataset by following this tutorial: How to make a great R reproducible example

Reproducible example

df <- data.frame(No = 1:165,
                 y1 = rnorm(165,1,1),
                 y2 = rnorm(165,4,1),
                 y3 = rnorm(165,2,1))
df <- df %>% mutate(s1 = y1*0.5, s2 = y2*0.4, s3 = y3*0.2)

Hi,Thanks dc37 I have provided more details about my code – Shalen Feb 18 '20 at 15:17 — Shalen, Feb 18 '20 at 15:17

score 0 · Answer 2 · answered Feb 18 '20 at 15:01

Hi thanks for your time and kind help. My file contains 12 columns of 165 data with the names A1, A2, ..., A7 and S1, S2, ..., S7. and this is my code

graphics.off()
rm(list=ls())

library(ggplot2)
library(dplyr)


setwd("F:/files/")

file1<-read.csv("F:/Self/ave1.csv")
file2<-read.csv("F:/Self/ave2.csv")

pdf("4.pdf",width=10,height=4)
par(mfrow =c(4, 1))
cl<-rainbow(20)

names(file1)
names(file2)

ggplot() + 
  geom_ribbon(data=file1, aes(x = No., ymax = A1_10 + S1_10, ymin = A1_10 - S1_10),alpha = 0.20, fill=cl[1])+
  geom_line(data = file1, aes(x = No., y = A1_10), colour="red")+
  geom_ribbon(data=file1, aes(x = No., ymax = A2_10 + S2_10, ymin = A2_10 - S2_10),alpha = 0.20, fill=cl[3])+
  geom_line(data = file1, aes(x = No., y = A2_10), colour=cl[3])+
  geom_ribbon(data=file1, aes(x = No., ymax = A3_10 + S3_10, ymin = A3_10 - S3_10),alpha = 0.20, fill=cl[11])+
  geom_line(data = file1, aes(x = No., y = A3_10), colour=cl[11])+
  geom_ribbon(data=file1, aes(x = No., ymax = A5_10 + S5_10, ymin = A5_10 - S5_10),alpha = 0.20, fill=cl[8])+
  geom_line(data = file1, aes(x = No., y = A5_10), colour=cl[8])+
  geom_ribbon(data=file1, aes(x = No., ymax = A6_10 + S6_10, ymin = A6_10 - S6_10),alpha = 0.20, fill=cl[12])+
  geom_line(data = file1, aes(x = No., y = A6_10), colour=cl[8])+
  geom_ribbon(data=file1, aes(x = No., ymax = A7_10 + S7_10, ymin = A7_10 - S7_10),alpha = 0.20, fill=cl[18])+
  geom_line(data = file1, aes(x = No., y = A7_10), colour=cl[8])+
  xlab("Number")+ylab("Value")+
  theme(legend.position="bottom")


ggplot() + 
      geom_ribbon(data=file2, aes(x = No., ymax = A11_10 + S11_10, ymin = A11_10 - S11_10),alpha = 0.20, fill=cl[1])+
      geom_line(data = file2, aes(x = No., y = A11_10), colour="red")+
      geom_ribbon(data=file2, aes(x = No., ymax = A21_10 + S21_10, ymin = A21_10 - S21_10),alpha = 0.20, fill=cl[3])+
      geom_line(data = file2, aes(x = No., y = A21_10), colour=cl[3])+
      geom_ribbon(data=file2, aes(x = No., ymax = A31_10 + S31_10, ymin = A31_10 - S31_10),alpha = 0.20, fill=cl[11])+
      geom_line(data = file2, aes(x = No., y = A31_10), colour=cl[11])+
      geom_ribbon(data=file2, aes(x = No., ymax = A41_10 + S41_10, ymin = A41_10 - S41_10),alpha = 0.20, fill=cl[1])+
      geom_line(data = file2, aes(x = No., y = A41_10), colour="red")+
      geom_ribbon(data=file2, aes(x = No., ymax = A51_10 + S51_10, ymin = A51_10 - S51_10),alpha = 0.20, fill=cl[8])+
      geom_line(data = file2, aes(x = No., y = A51_10), colour=cl[8])+
      geom_ribbon(data=file2, aes(x = No., ymax = A61_10 + S61_10, ymin = A61_10 - S61_10),alpha = 0.20, fill=cl[12])+
      geom_line(data = file2, aes(x = No., y = A61_10), colour=cl[8])+
      geom_ribbon(data=file2, aes(x = No., ymax = A71_10 + S71_10, ymin = A71_10 - S71_10),alpha = 0.20, fill=cl[18])+
      geom_line(data = file2, aes(x = No., y = A71_10), colour=cl[8])+
      xlab("Number")+ylab("Value")+
      theme(legend.position="bottom")


dev.off()

I have several files (5 files) that I wanted to bring the shapes on a page to be able to compare them, however, it did not work and still, I have one plot per page. I know also I can use facet grid but I am not familiar with how it works yet especially now when the name of columns of different files are different. It seems I have a loong looooong way to learn R though.

Hi Salen, answers are dedicated for posting solutions trying to solve the question. Please edit your question instead and add all relevant informations about the structure of your file and then delete your answer. Please read this link to know how to provide a reproducible example of your dataset https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. For now, it is quite useless. Have you figure it out how to adapt my solution to your dataset ? Have you try to run my reproducible example and the code in order to see if it is what you are looking for ? — dc37, Feb 18 '20 at 18:35

add legend and smoothing to a graph

2 Answers2