0

I have a df with as 700 x 500

  Time   Cell1 Cell2 Cell3  Cell4   Cell5   PID   Panels   
  T1    51.7  37.7  11.80  1.89    7.28     111   p1
  T2    49.8  34.8  10.40  1.29    6.01     111   p1
  T3    55.1  34.0  7.61   2.78    6.99     111   p1
  T1    44.1  55.4  29.90  5.26    6.91     112   p2
  T2    58.4  32.6  22.40  7.89    4.28     112   p2
  T3    46.7  49.9  25.20  7.89    4.70     112   p2

Now I want to make boxplots of the distribution of each Cell1, Cell2 ... (y-axis) over Time( here 3-time points)

I need to write the ggplot2() function in a way that it can cycle through my column names of the Cell and make separate boxplots, the fill, and the color need to be by Panels and I also need my y-axis to be log scale. The current df is not log-scaled(I can take care of that if it's of a hassle of an argument pass)

my snippet fails.

I am not able to figure out how to make the code cycle through the df columns headers. Trying to figure out how to pass the y-axis string argument get them mapped to aes of ggplot() for their values in df. Here I understand that the strings are being passed as an argument but not being recognized as a column in the df (please correct me if I am wrong).

Any suggestions/workaround will be appreciated. I just do not want to write the ggplot() function so many times. Can't figure out how to make the loop here or make a function around the plot.

Cell_list=colnames(df[6:10])

# create for loop to produce ggplot2 graphs 
for (i in seq_along(Cell_list)) { 

    plot <- 
        ggplot (df, aes(x=df$Time,y=Cell_list[i], col=df$Panels,.desc=TRUE))+ 
        geom_boxplot(outlier.size = 0 ) + 
        geom_point(aes(fill=df$Panels, col=NULL),shape=21, alpha=0.5, size=2, 
                   position = position_jitterdodge(jitter.width = 0.2))+
        theme_bw() + xlab("")
    # print plots to screen
    print(plot)
}
ivivek_ngs
  • 917
  • 3
  • 10
  • 28
  • 1
    You should *never* use `data$column` inside `aes()`, it will mess things up. `aes()` expects unquoted column names. When you need to loop over columns, there are a few options described at the linked FAQ. An old, but easy, way would be to use `aes_string()` instead of aes(), which expects string (quoted) column names: `ggplot(df, aes_string(x = "Time", y = Cell_list[i], color = "Panels")) + ...`. You can log the y axis with `+scale_y_continuous, trans = "log")` – Gregor Thomas Sep 26 '19 at 00:50
  • This is amazing, thanks for this solution. I should have searched well. Many thanks for also pointing to the FAQ. – ivivek_ngs Sep 26 '19 at 17:32

0 Answers0