1

I'm trying to write a ggplot for-loop to no success. Essentially, I'm trying to make a scatter plot, according to the aminoacid (so essentially 22 different scatter plots only containing the values for said aminoacid). Instead, I'm getting every every value being plotted in every output plot.

The data file looks like this:

dput(head(df_melt_differentials))

structure(list(codon = c("AAA", "AAC", "AAG", "AAT", "ACA", "ACC"
), Fed_differential_cutoff0.5 = c(0.405320284943889, 0.538603465353382, 
0.594679715056111, 0.461396534646618, 0.279723500180007, 0.350047954876902
), Fed_differential_cutoff0 = c(0.400929382467845, 0.541230665098641, 
0.599070617532155, 0.458769334901359, 0.281177150483858, 0.351472083384939
), Fed_differential_cutoff1 = c(0.389691692491739, 0.572371186663778, 
0.610308307508261, 0.427628813336222, 0.258694141571916, 0.371346938275356
), Fed_differential_cutoff2 = c(0.376102000883263, 0.543866386823925, 
0.623897999116737, 0.456133613176075, 0.240118371752021, 0.371624132164088
), Starved_differential_cutoff0.5 = c(0.35341548435504, 0.612764761460883, 
0.64658451564496, 0.387235238539117, 0.241749339598093, 0.401216490580919
), Starved_differential_cutoff0 = c(0.351704818898789, 0.613092767267543, 
0.648295181101211, 0.386907232732457, 0.242028282002779, 0.398227680007641
), Starved_differential_cutoff1 = c(0.351258676092076, 0.616216524001233, 
0.648741323907924, 0.383783475998767, 0.236979413320061, 0.417121137360074
), Starved_differential_cutoff2 = c(0.330195165073707, 0.631859350667716, 
0.669804834926293, 0.368140649332284, 0.226783649173637, 0.440433256347991
), AA = c("K", "N", "K", "N", "T", "T"), full_amino = c("Lysine", 
"Asparagine", "Lysine", "Asparagine", "Threonine", "Threonine"
), aminoacid = c("Lys", "Asn", "Lys", "Asn", "Thr", "Thr"), wobble = c("AT_wobble", 
"GC_wobble", "GC_wobble", "AT_wobble", "AT_wobble", "GC_wobble"
), wobble_single = c("A_wobble", "C_wobble", "G_wobble", "T_wobble", 
"A_wobble", "C_wobble")), row.names = c(NA, 6L), class = "data.frame")

My loop is:

for (aminoacid in df_melt_differentials$aminoacid) {
  
  cutoff0_gingold_loop <- ggplot(df_melt_differentials, aes(x=Fed_differential_cutoff0, y= Starved_differential_cutoff0)) +
    geom_point(aes(color = wobble)) +
    theme_bw(base_size = 16)+
    labs(title = paste(aminoacid, "RSCU of Differential Genes (Log2FC cutoff = 0)")) +
    geom_abline(slope = 1, intercept = 0, linetype= "dashed")
  
  cutoff0_gingold_loop +
    geom_label_repel(aes(label = codon),
                     box.padding   = 0.35, 
                     point.padding = 0.5,
                     segment.color = 'grey50') +
    theme_classic()
  
    ggsave(filename = paste(aminoacid, "RSCU_FvS_differential_cutoff0_gingold.png", sep = "_"), bg = "white", width = 7, height = 7, dpi = 600)
}

I know it's probably a silly mistake but I can't seem to figure out where I've gone wrong.

I also have a secondary question but I'm not too bothered if this isn't answered; In the end, I normally have 4 different scatter plots according to the 4 different cutoffs I have (0, 0.5, 1 and 2). Is there a way to incorporate this into the loop? Ideally, I'd like to have Fed_differential_cutoff0 vs Starved_differential_cutoff0 (for each individual aminoacid), and the same for cutoff0.5/cutoff1/cutoff2.

Thanks in advance!

Danby
  • 108
  • 7
  • 1
    consider adding `dput(head(df_melt_differentials))` to your reprex share the machine readable exact data so people don't have to parse a text table to reproduce your plots – Nate Jun 22 '20 at 15:13
  • As to your secondary question - yeah, you could do a nested loop, or you could convert your data to a long format (see [this FAQ](https://stackoverflow.com/q/2185252/903061)) and use facets - this would probably be nicer, giving your all four cutoffs as subplots. – Gregor Thomas Jun 22 '20 at 15:54

1 Answers1

1

You don't have a subset anywhere. I would rewrite as:

for (this_aminoacid in unique(df_melt_differentials$aminoacid)) {
  
  cutoff0_gingold_loop <- ggplot(
    data = subset(df_melt_differentials, aminoacid == this_aminoacid),
    aes(x=Fed_differential_cutoff0, y= Starved_differential_cutoff0)
  ) +
    geom_point(aes(color = wobble)) +
    theme_bw(base_size = 16)+
    labs(title = paste(this_aminoacid , "RSCU of Differential Genes (Log2FC cutoff = 0)")) +
    geom_abline(slope = 1, intercept = 0, linetype= "dashed")
  
  cutoff0_gingold_loop +
    geom_label_repel(aes(label = codon),
                     box.padding   = 0.35, 
                     point.padding = 0.5,
                     segment.color = 'grey50') +
    theme_classic()
  
    ggsave(filename = paste(this_aminoacid, "RSCU_FvS_differential_cutoff0_gingold.png", sep = "_"), bg = "white", width = 7, height = 7, dpi = 600)
}

I have

  • added subset to tell R which data to use each time
  • changed the name of the looping variable to this_aminoacidfor clarity
  • Looped over unique(df_melt_differentials$aminoacid) so each value is only used once instead of however many times it shows up in your data
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294