0

I'm currently rewriting an article some person wrote some time ago and could not follow along with it. This article has a table named "Table 2: median and interquartile range of severity of depressive symptoms and serum levels of interleukin-6 and tumor necrosis factor at baseline, after intervention and at 6 and 12 months of follow-up.".

This is the table I'm talking about.

enter image description here

The data frame is currently variables with the values of each subject. Like il6_baseline, il6_6mon, il6_12mon, il6_after (for interleukin-6). The same thing with the tumor necrosis factor. These are continuous variables.

And the "TCC" AND "PDSE" in the table are two different groups that had different treatments in that period.

But I know how to get the medians and all that. My problem is what kind of graph do I use to illustrate these informations visually the best way? And if you could help me with a basic syntax that I could work from there. I'm a fresh learner of R, I can usually get stuff done, but I never messed much with graphs, and now I have this obstacle in front of me.

Thanks for your comprehension and attention. Have a good day!

Output from dput of a subset for visualization:

structure(list(a02rec = c(2925, 2461, 2887, 4132, 2734, 4176, 
2158, 690, 4287, 2871), ND_IL_6I = c(156.475, 25.393, 5.20696, 
29.448, 636.561, 16.7, 20.83028, 13.04912, 17.28, 30.686), ND_IL6_intermed = c(NA, 
NA, NA, NA, NA, 4.5048, 49.654, 5.1872, 23.8992, NA), IL_6_6mesesultimovalorITT = c(62.163, 
59.278, 45.1272, 19.258, 17.689, 15.864, 16.0992, 22.88964, 14.748, 
21.706), modeloterapia = structure(c(2L, 1L, 1L, 2L, 1L, 1L, 
2L, 2L, 1L, 2L), .Label = c("pdse", "tcc"), class = "factor")), row.names = c(NA, 
10L), class = "data.frame")

In this subset above, the "a02rec" variable does not matter, it is just an identifier. The variables starting "IL_6" and "ND_IL_6" are the ones from the collected serum levels, and the "modeloterapia" variable is about whether the subject attended PSDE or TCC therapy model. I wanted to create a graph as I said in the previous comment. Three graphs, one for each group (PSDE, TCC and total sample), and have some kind of boxes showing the serum levels between these timestamps.

I'm not sure if it would be better to show in a "box-like" graph or a "point/dot-like" graph what I want to achieve. I'd like the graphs to demonstrate change in serum levels between periods (initial/baseline, 6 months, 12 months and after treatment).

Tung
  • 26,371
  • 7
  • 91
  • 115
  • Please share a sample from your data frame - `dput()` is the best way to share data because it is copy/pasteable and includes class and structure information, e.g., `dput(your_data[1:10, ])` for the first 10 rows. (Pick a suitable subset to illustrate the problem.) – Gregor Thomas Dec 09 '20 at 15:35
  • As for "what kind of graph do I use" - please be clearer about your goals for the visualization. What do you want to compare? What do you want the reader to see? You could, for example, do small multiples for each treatment group, plotting the estimates as points with error bars for the ranges, with the time on the x-axis. This ignore the p-values, and emphasizes within-treatment comparisons. If you want to emphasize the trend, we could connect the estimates with lines. Perhaps use shaded areas for the confidence intervals.... – Gregor Thomas Dec 09 '20 at 15:40
  • 1
    There are hundreds of ways to visualize this information, and which one is best will depend on **your goals** - we can't just tell you what is best. Unless you can be more specific about the visualization you want, the question will probably be closed for "needing more detail" or for being "opinion-based". – Gregor Thomas Dec 09 '20 at 15:41
  • Great - please edit your question to include specifications for the graph instead of putting it down here in comments. We can help you with the syntax for that, and that should give you a good starting place to experiment with variations of the graph as well. – Gregor Thomas Dec 09 '20 at 15:55

1 Answers1

1

They key for using ggplot2 effectively is converting your data to a long format.

long_data = df %>%
  pivot_longer(matches("IL")) %>%
  separate(name, sep = "_", into = c("drug", "something", "time"))

head(long_data)
# # A tibble: 6 x 6
#   a02rec modeloterapia drug  something time                 value
#    <dbl> <fct>         <chr> <chr>     <chr>                <dbl>
# 1   2925 tcc           ND    IL        6I                   156. 
# 2   2925 tcc           ND    IL6       intermed              NA  
# 3   2925 tcc           IL    6         6mesesultimovalorITT  62.2
# 4   2461 pdse          ND    IL        6I                    25.4
# 5   2461 pdse          ND    IL6       intermed              NA  
# 6   2461 pdse          IL    6         6mesesultimovalorITT  59.3

I'm at a loss about some of the meaning of your data, but have taken guesses that hopefully you can correct.

With data in that format, plotting is relatively straightforward. Here is an example (which looks a little weird due to the small sample of data).

ggplot(long_data, aes(x = time, y = value, fill = drug)) +
  geom_boxplot() + 
  facet_wrap(vars(modeloterapia), ncol = 1)

enter image description here

If you need to reorder the x-axis, turn the variable on the x-axis into a factor with the levels in the order you want, as in this answer.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    Thanks Gregor! Your explanations were really helpful to improve my knowledge with ggplot2. I'm getting used slowly to some patterns when using ggplot. I got that usually we have to turn some variables names into a new variable sometimes to get what we want as labels, etc. I finally could do what I wanted! Have a nice day! – Bruno Montezano Dec 10 '20 at 23:15