1

I'm trying to create a line plot like this one (made on Powerpoint): enter image description here

showing the dependency of weight on age (weight~age), for 3 subgroups (so each group has its own line). I also want the chart to show:

  • sample size for each point, meaning, the number of individuals for each subgroup at each age.
  • significance differences between the subgroups at each age. (TukeyHSD results)

one more important thing: I'm gonna have to repeat those graphs for several parameters (like length~age, and width~age), and also might have to do them several times, so I would really like to avoid manual inserting solutions, like geom_text if possible..

I've tried several options but keep getting "stuck" at some point. for example:

I have tried this code:

plot_morphologic <- ggplot(data = weight_table, 
       mapping = aes(x = as.numeric(age), 
                     y = weight, color=POPULATION))+
  geom_line(se=TRUE)

but that creates one line for the 3 populations...

I've also tried this:

plot_morphologic <- ggline(data=weight_table, x = "age", y = "weight", add = "mean_sd",
       color = "POPULATION")+
  stat_compare_means(aes(group = POPULATION), method = "anova", label = "p.signif", 
                     label.y = c(40),na.rm=F)+
  stat_n_text(group="POPULATION")

but couldn't split the sample size to each subgroup and couldn't add the significance of the differences between the subgroups.

an example of my data:

weight_table1
# A tibble: 246 × 4
   ID         POPULATION age weight
   <chr>      <chr>  <chr>       <dbl>
 1 Shere Khan A      0           13.4 
 2 Shere Khan A      1           14.2 
 3 Shere Khan A      2           17.4 
 4 Serafina   B      0            5.19
 5 Serafina   B      1           15.3 
 6 Serafina   B      2           NA   
 7 Kaa        A      0            7.68
 8 Kaa        A      1            6.92
 9 Kaa        A      2           19.4 
10 Shenzi     C      0            6.96

tnx!!

Maya Eldar
  • 49
  • 5

1 Answers1

0

You can achieve all this with the following base R functions:

  1. Plot the axis and the first line with plot with parameters xlim and ylim set to the maximum range you want to plot (usually automatically computed with 1.1*range(yourdata$yourrow) or so).
  2. Plot the two other lines with lines.
  3. Plot the points and the error bars with points and arrows, as explained in this answer. Note the amusing rant about the "particularly intuitive parameter code=3" ;-)
  4. Add the text with text.

I do not see why the last step is a problem for repeated use cases, because you can programatically create the label text with sprintf("n=%i", nrow(yourdata)).

cdalitz
  • 1,019
  • 1
  • 6
  • 7
  • thank u! can u please explain how to "programatically create the label text with sprintf("n=%i", nrow(yourdata))"? do I count manually the sample sizes and the significance levels? how do I add them? all at once or one by one? I find it a little wired that there is no function for that in a program made for statistics... haha – Maya Eldar Jun 19 '22 at 19:48
  • There are functions for these common tasks, of course. To count the number of levels of a factor, you can call `length(levels(x))` where *x* is the categorial variable. for the sample sizes in a data frame, there is `nrow(df)` and `ncol(df)`. You can look up more info by prefixing a question mark, e.g. `?levels` etc. The point of a ststistical programming language is that you do not hace to live with stuff provided out-of-the-box, but that you can write your own fully automated analysis and visialuzation programs. – cdalitz Jun 19 '22 at 20:38
  • And one further hint: To find out how to extract the result of `TukeyHSD`, enter `?TukeyHSD` and read the "Value" section, which desribes the return value. Apparently, you are interested in "p adj" of the return value. – cdalitz Jun 19 '22 at 20:52