0

I am trying to use the melt() function from the “reshape2” package in R to stack a dataframe while keeping categorical labels for the individual observations. My question is how do I adapt Eric Cai's code Code to produce multiple side-by-side notched boxplots at the level of behaviours$Family (a 2 level factor column) grouped by each behavioural variable for the data-set called behviours (a link to the Dummy data is supplied below)?

My aim is to colour code these multiple notched boxplots for each family (V4=red and W3 = blue) with a legend. However, I am encountering an issue with dimensions when trying to arrange the dataframe using the melt() function, from which I am having trouble deciphering. If anyone can help then many thanks in advance.

The reproducible dummy data is found at bottom of a stack overflow page Reproducible data

 Here is an example:

 I am trying to follow Eric Cai's instructions
 (1) Stack the data:
     (a) Retain the categorical (2 level factor column) for family [,1]
     (b) Retain all behavioural variables [,2:13]

  #Set vectors for labelling the data

                      behaviours.label=c("Swimming", 
                                         "Not.Swimming",
                                         "Running", 
                                         "Not.Running",
                                         "Fighting",
                                         "Not.Fighting",
                                         "Resting",
                                         "Not.Resting",
                                         "Hunting",
                                         "Not.Hunting",
                                         "Grooming",
                                         "Not.Grooming")

                         family.labels=c("V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8",
                                         "V4", "G8")

    library(tidyr)                        
    data_long <- gather(behaviours, x, Mean.Value, Swimming:Not.Grooming)
    head(data_long)  

    # stack the data while retaining the Family and behavioural variables 

    stacked.data = melt(behaviours, id = c('Family', 'behaviours'))

    # remove the column that gives the column name variable
    stacked.data = stacked.data[, -3]

    #head(stacked.data)
    colnames(stacked.data)<-c("Family", "Behaviours", "Values")

Generating the Box Plots

Generate an object called boxplots.double, which will use the formula text{Mean.value ~ Family + Behaviours} to separate the plots into 12 groups of doublets (i.e. each behaviour will be grouped at the level of behaviours$family in a single plot). In Eric Cai's code “at = ” is an option to specify the locations of the box plots along the horizontal axis, and xaxt = ‘n’ suppresses the default horizontal axis which adds custom axis with the axis() and title()

   boxplots.double = boxplot(values~Family + Behaviours, 
                             data = stacked.data, 
                             at = c(1:24), 
                             xaxt='n',
                             ylim = c(min(0, min(-3)), 
                             max(7, na.rm = T)),
                             notch=TRUE,
                             col = c("red", "blue"),
                             names = c("V4", "G8"),
                             cex.axis=1.0,
                             srt=45)

  axis(side=1, at=c(1.8, 6.8), labels=c("Swimming", 
                                       "Not.Swimming",
                                       "Running", 
                                       "Not.Running",
                                       "Fighting",
                                       "Not.Fighting",
                                       "Resting",
                                       "Not.Resting",
                                       "Hunting",
                                       "Not.Hunting",
                                       "Grooming",
                                       "Not.Grooming"), line=0.5, lwd=0)

Error message

   Error in axis(side = 1, at = 1:24, labels = c("V4", "G8"), xaxt = "n",     : 
  'at' and 'labels' lengths differ, 24 != 2
  In addition: Warning message:
  In bxp(list(stats = c(-1.20186549488911, -0.970033304559564,   -0.465271399251147,  :
  some notches went outside hinges ('box'): maybe set notch=FALSE
Community
  • 1
  • 1
Alice Hobbs
  • 1,021
  • 1
  • 15
  • 31
  • I think the problem might be that your `at` in `boxplot` has length 6 whereas family + Behaviours has 24 posibilities – Richard Telford Apr 19 '16 at 11:08
  • Hiya Richard, Thank you for replying. I changed the length of Family + Behaviours to 24. However, I am still getting error messages. Have you got any ideas. This is the first time to that I have written code to set my own horizontal (x axis) and vertical (y axis) settings. Many thanks if you can help. I just cannot figure this problem out....I added the new error message above – Alice Hobbs Apr 19 '16 at 14:56
  • Drop the names for now - there needs to be 24 of these as well – Richard Telford Apr 19 '16 at 15:27
  • Hiya Richard, I dropped the names and increased the names to 24. The new error message is: Warning message: In bxp(list(stats = c(-1.20186549488911, -0.970033304559564, -0.465271399251147, : some notches went outside hinges ('box'): maybe set notch=FALSE – Alice Hobbs Apr 19 '16 at 16:07
  • You can probably ignore that warning. It just means that the uncertainty on the median is larger than the difference between the median and the 25th or 75th percentiles. IE the notch is bigger than the box. Basically it means you need more data. – Richard Telford Apr 19 '16 at 16:10
  • Interesting. I agree here. However, this code for some reason is still not producing the figure – Alice Hobbs Apr 19 '16 at 16:20
  • I see, you have the same `at` problem in `axis`. If you take away the `xaxt = "n"` you can get the default labels and skip the axis command – Richard Telford Apr 19 '16 at 16:50
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/109587/discussion-between-alice-hobbs-and-richard-telford). – Alice Hobbs Apr 19 '16 at 17:54

1 Answers1

1

After Richard Telford kindly offered to help, this code produces multiple side-by-side boxplots grouped at the level of the categorical column (2 levels) called Family using the melt() function contained in the package reshape2

   clear the working directory
   rm(list=ls())

   data(behaviours)

   #Set vectors for labelling the data

   behaviours.labels=c("Swimming",  
                       "Not.Swimming",
                       "Running", 
                       "Not.Running",
                       "Fighting",
                       "Not.Fighting",
                       "Resting",
                       "Not.Resting",
                       "Hunting",
                       "Not.Hunting",
                       "Grooming",
                       "Not.Grooming")

       family.labels=c("V4", "G8",
                       "V4", "G8",
                       "V4", "G8",
                       "V4", "G8",
                       "V4", "G8",
                       "V4", "G8",
                       "V4", "G8",
                       "V4", "G8",
                       "V4", "G8",
                       "V4", "G8",
                       "V4", "G8",
                       "V4", "G8")

      library(tidyr)

      #Structure the data from wide to long format 

      data_long <- gather(behaviours, x, Mean.Value, Swimming:Not.Grooming)
      head(data_long)    

   library(reshape2)

   # stack the data while retaining Family and Values calculated from behaviours[,2:13] using the melt() function

   stacked.data = melt(data_long, id = c('Family', 'x'))
   head(stacked.data)

   # remove the column that gives the column name of the `variable' from all.data

   stacked.data = stacked.data[, -3]
   head(stacked.data)

   #Rename the column headings

   colnames(stacked.data)<-c("Family", "Behaviours", "Values")    

   #Generate the side-by-side boxplots

   windows(height=10, width=14)
   par(mar = c(9, 7, 4, 4)+0.3, mgp=c(5, 1.5, 0))

   boxplots.double = boxplot(Values~Family + Behaviours, 
                             data = stacked.data, 
                             at = c(1:24), 
                             ylim = c(min(0, min(0)), 
                                      max(1.8, na.rm = T)),
                             xaxt = "n",
                             notch=TRUE,
                             col = c("red", "blue"),
                             cex.axis=0.7,
                             cex.labels=0.7,
                             ylab="Values", 
                             xlab="Behaviours",
                             space=1)

   axis(side = 1, at = seq(2, 24, by = 2), labels = FALSE)
   text(seq(2, 24, by=2), par("usr")[3] - 0.2, labels=unique(behaviours.labels), srt = 45, pos = 1, xpd = TRUE, cex=0.8)
   legend("topright", title = "Family", cex=1.0, legend=c("V4" , "G8"), fill=c("Blue", "Red"), lty = c(1,1))

enter image description here

Alice Hobbs
  • 1,021
  • 1
  • 15
  • 31