0

I am starting over using R and ggplot to visualize time series data of environmental variables. So far I love the oppurtnities of ggplot2 to visualize the data, easily choosing different periods and variables to plot and define aesthetics. But now I have encountered the first problem that I wasn´t really able to google:

  • My goal is to plot several variables from different dataframes with individual aesthetics(fixed period, same y-Axis, different colors etc.) into one Plot

I have 8 dataframes ("TreeA" - "TreeH") structured like following, where TreeA is the Name of the data frame, "Time" is the time of measurement, formatted in POSIXct format, and Tleaf, Tair and Tdiff are three of 16 variables:

 TreeA
                         Zeit  Tleaf     Tair  Tdiff ........
       1: 2018-05-18 00:00:00 12.997 13.20000 -0.203   
       2: 2018-05-18 00:10:00 13.082 13.20000 -0.119     
       3: 2018-05-18 00:20:00 11.909 12.06700 -0.158   
       4: 2018-05-18 00:30:00 11.315 11.53300 -0.219     
       5: 2018-05-18 00:40:00 11.251 11.46700 -0.216

I have melted the DFs to long format resulting

TreeA_long
                      Time variable        value
    1: 2018-05-18 00:00:00    Tleaf        12.997000000
    2: 2018-05-18 00:10:00    Tleaf        13.082000000
    3: 2018-05-18 00:20:00    Tair         11.909
    4: 2018-05-18 00:30:00    Tair         11.315
    5: 2018-05-18 00:40:00    Tdiff         1.251

From this I have been successfully plotting Graphs with this ggplot functionalities:

ggplot(subset(TreeA_long, variable %in% c("Tleaf","Tair","Tdiff")),
       aes(x=Time,
           y=value, color=variable)) +
  geom_line() +
  scale_x_datetime(limits=start.endKW21, labels = date_format("%d") , breaks = date_breaks("24 hours")) +
  scale_y_continuous(limits = c(5,55),breaks = seq(10,55, by = 2)) +
  labs(title="Mai/Juni Cbet1", x="Day", y="Temperature") +
  theme(legend.position='right') +
  scale_color_manual(values = c("Tleaf" = "green", "Tair" = "blue", "Tdiff" = "yellow"))

I have tried to add a second geom_line(data=TreeB_long) for plotting variables from the second Dataframe in the same plot. It has worked to plot all the variables from TreeB but of course I need to compare same variables and also I want to specify aesthetics (color of the lines, dashing lines etc. for each variable.

So my question is:

  • How can I compare TreeA to TreeB in one Plot?
  • Also I would be open to merge the different Dataframes, but it is not working to connect in long format with the same variable names

I hope that my questions are clear enough, and you can help me somehow. I believe that there is an easy solution to my problem, but as I said googling didn´t yield good results so far.

Thank you and have a good day! Konrad

Konrad Bauer
  • 91
  • 1
  • 1
  • 11
  • 3
    I think you should probably append the `treeA`-`treeH` datasets, including an indicator variable for the name of the data (e.g. `dplyr::bind_rows(tibble::lst(treeA, treeB, <...>, treeH), .id = "data")`), then `melt()` and use the dataset indicator variable to construct your plot. If you need more specific advice, it would be helpful if you included a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Mikko Marttila Jul 05 '18 at 12:38
  • Thank you. I fail to produce a reproducible example, because I cannot upload my *csv inputs. Anyways with your great help I am just one step away from the solution: using `dplyr::bind_rows` I result one dataframe with the column `data` indicating wether the measurments belong to `treeA, treeB,...`. Can I tell the ´melt()´ function to draw the name for the `variable` column from the respective column `Tleaf,Tair...` combining it with the `data` column from `melt()` to result a long dataset with a Variable Column wih the entries `treeATleaf, treeBTleaf,...` – Konrad Bauer Jul 09 '18 at 11:49
  • You can't do that directly with `melt()`; but if you wanted to, you could just `paste()` together the `data` and `variable` columns after the `melt()` to achieve that. However, it might be a good idea to keep the two separate: that will allow for easier control of the aesthetics for plotting. I've added an answer demonstrating the approach that I would take here. – Mikko Marttila Jul 09 '18 at 12:36
  • By the way, if you need to create an example with complex/large data, it can be a good idea to use `dput(head(data))` to easily generate code needed to read in a small subset of your data. – Mikko Marttila Jul 09 '18 at 12:42
  • Ok thanks. How would I tell the aes function to draw Information from two columns? – Konrad Bauer Jul 09 '18 at 13:12
  • You can use expressions in `aes`, so, for example, if you wanted to have a different colour for each combination of `data` and `variable` you could specify `colour = paste(data, variable)`. (Or alternatively e.g. `interaction()` rather than `paste()`, but this will rarely matter.) – Mikko Marttila Jul 09 '18 at 13:30

2 Answers2

0

I think you should probably append the treeA-treeH datasets, including an indicator variable for the name of the data (e.g. dplyr::bind_rows(tibble::lst(treeA, treeB, <...>, treeH), .id = "data")), then melt() and use the dataset indicator variable to construct your plot.

Here's a simplified example. First, let's read in the data that you give:

txt <- "Date Time  Tleaf     Tair  Tdiff
2018-05-18 00:00:00 12.997 13.20000 -0.203
2018-05-18 00:10:00 13.082 13.20000 -0.119
2018-05-18 00:20:00 11.909 12.06700 -0.158
2018-05-18 00:30:00 11.315 11.53300 -0.219
2018-05-18 00:40:00 11.251 11.46700 -0.216"

treeA <- read.table(text = txt, header = TRUE,
                    stringsAsFactors = FALSE)

For the sake of the example, I'm also creating a treeB dataset by just adding some noise to treeA:

library(dplyr)
library(ggplot2)

set.seed(1)
n <- nrow(treeA)

treeB <- treeA %>%
  mutate_if(is.numeric, function(x) x + rnorm(n))

We can now append the two datasets with bind_rows() and add a variable to show the original data frame.

tree <- tibble::lst(treeA, treeB) %>%
  bind_rows(.id = "data") %>%
  mutate(dttm = as.POSIXct(paste(Date, Time)))

Before plotting, it's useful to reshape the data to long form, as you have done before:

tree_long <- reshape2::melt(tree, measure = c("Tleaf", "Tair", "Tdiff"))

Now we are ready to plot. The choice of the layout you want to use will of course depend on what aspect of the data you want to emphasize; for example, if the comparison between different tree datasets is of interest, it might be a good idea to use facetting to compare the trees within each variable:

ggplot(tree_long, aes(dttm, value, color = data)) +
  facet_wrap(~ variable, scales = "free_y", ncol = 1) +
  geom_line()

Created on 2018-07-09 by the reprex package (v0.2.0.9000).

Mikko Marttila
  • 10,972
  • 18
  • 31
  • Thanks a lot for your help.I was short before doing a lot of work manually. Facetting is also a good Idea for my data. – Konrad Bauer Jul 09 '18 at 12:53
0

So according to Mikko Marttila´s proposal I was binding together all (already loaded 8 Dataframes (treeA, ..., treeF) to one using tibble::lst and dplyr::bind_rows, resulting a new DF:

Liste <- lst (treeA,treeB,treeC,treeD,treeE,treeG,treeH)
new   <- bind_rows(Liste, .id="Test")

    >         Test                Time  Tleaf     Tair   ....
    >     1: treeA 2018-05-18 00:00:00 12.997 13.20000 
    >     2: treeA 2018-05-18 00:10:00 13.082 13.20000 
    >     3: treeA 2018-05-18 00:20:00 11.909 12.06700 
.....
    >   300: treeH 2018-05-18 00:30:00 11.315 11.53300 
    >   301: treeH 2018-05-18 00:40:00 11.251 11.46700 

After this using reshape2::melt with defining two columns as id.Vars yields a long Dataframe with 4 columns

long <-melt(new, id.vars = c("Time", "Test"))

     long
                           Time  Test variable        value
         1: 2018-05-18 00:00:00 treeA    Tleaf 12.997000000
         2: 2018-05-18 00:10:00 treeA    Tleaf 13.082000000
         3: 2018-05-18 00:20:00 treeA    Tleaf 11.909000000
...
       300: 2018-05-18 00:30:00 treeH    Tleaf 11.315000000
       301: 2018-05-18 00:40:00 treeH    Tleaf 11.251000000

finally combining the Columns Zeit and Test by tidyr::unite yields a long format Dataframe including all my Data from the 8 input Dataframes:

long2 <- unite(long, variable, c(Test, variable), remove=TRUE)

long2
                       Zeit       variable        value
     1: 2018-05-18 00:00:00    treeA_Tleaf 12.997000000
     2: 2018-05-18 00:10:00    treeA_Tleaf 13.082000000
     3: 2018-05-18 00:20:00    treeA_Tleaf 11.909000000
...
   300: 2018-05-18 00:30:00    treeH_Tleaf 11.315000000
   301: 2018-05-18 00:40:00    treeH_Tleaf 11.251000000

Having this is all that I need to work with ggplot2 being able to identify and load values for plotting from the different sources. If there is easier ways to achieve this let me know in the comments. also I think there might be solutions using more functions of the base package. But since I need to get things done I don´t mind loading a lot of packages. Note that the Data pasted here is to visualize the structure.

Konrad Bauer
  • 91
  • 1
  • 1
  • 11