0

I am working with a dataset that I display as multiple boxplots. I have manually grouped the boxplots to align as groups, based on the week of measurements. However, lining each indiviual date of measurements in the axis makes that specific axis unreadable.

I therefore want to group all the x-axis labels of in a specific together to "Week #". enter image description here

Code is based on the boxplot command.

boxplot(dfswe, ylab ='SWE [mm]', xlab ='Time', las = 2, main ='SWE over time', 
        col = colcolours[1:34],
        at = c(1,4,5,6,7,8,9,10,11,14,15,16,17,20,21,22,23,26,27,28,29,32,33,34,35,36,37,38,39,40,43,44,47,50),
        names = colcolours[1:34]) 

dfswe is a simple dataframe consisting of 34 columns.

I have tried other techniques in grouping boxplots together, but haven't found a way to deal with irregular groupings. Some weeks contain more measurements than others.

dput:

dput(droplevels(dfswe))
structure(list(`07/02/2019` = c(82.68852496, 84.32592149, 90.05680936, 
81.05112843, NA, NA), `11/02/2019` = c(91.6942059, 79.41373189, 
91.6942059, 79.41373189, NA, NA), `11/02/2019` = c(63.03976655, 
72.86414576, 72.86414576, 73.68284402, 78.59503363, 70.40805096
), `13/02/2019` = c(72.86414576, 72.86414576, 87.60071456, 87.60071456, 
NA, NA), `13/02/2019` = c(87.60071456, 100.6998868, 88.41941283, 
88.41941283, 81.05112843, NA), `13/02/2019` = c(75.32024056, 
68.77065442, 74.50154229, 74.50154229, 62.22106829, 58.94627522
), `13/02/2019` = c(86.78201629, 76.13893882, 72.86414576, NA, 
NA, NA), `16/02/2019` = c(46.39290179, 48.84899659, 50.75929255, 
NA, NA, NA), `16/02/2019` = c(45.84710295, 48.30319775, 57.30887869, 
34.38532721, 27.50826177, NA), `19/02/2019` = c(79.41373189, 
71.22674922, 62.22106829, 54.85278388, 27.01704281, NA), `19/02/2019` = c(27.83574108, 
43.39100815, 44.20970641, 55.67148215, NA, NA), `19/02/2019` = c(17.19266361, 
24.56094801, 0, NA, NA, NA), `19/02/2019` = c(34.38532721, NA, 
NA, NA, NA, NA), `20/02/2019` = c(77.77633536, 65.49586135, 62.22106829, 
22.92355147, NA, NA), `20/02/2019` = c(22.92355147, 15.55526707, 
NA, NA, NA, NA), `20/02/2019` = c(28.65443934, 35.20402548, 42.57230988, 
54.03408562, NA, NA), `20/02/2019` = c(14.7365688, 22.10485321, 
NA, NA, NA, NA), `26/02/2019` = c(85.96331803, 72.86414576, 76.13893882, 
49.12189602, 29.47313761, NA), `26/02/2019` = c(0, 0, 0, 0, 0, 
NA), `26/02/2019` = c(0, 0, 0, 0, 0, NA), `26/02/2019` = c(0, 
0, 0, 0, 0, NA), `04/03/2019` = c(28.65443934, 32.74793068, 39.29751681, 
44.20970641, NA, NA), `06/03/2019` = c(88.41941283, 85.96331803, 
76.95763709, 29.47313761, 38.47881855, NA), `06/03/2019` = c(3.192923241, 
3.192923241, 3.192923241, 3.192923241, NA, NA), `06/03/2019` = c(3.192923241, 
3.192923241, 25.78899541, 3.192923241, 49.12189602, NA), `06/03/2019` = c(3.192923241, 
3.192923241, 3.192923241, 3.192923241, NA, NA), `08/03/2019` = c(85.96331803, 
82.68852496, 70.40805096, 67.95195616, 27.83574108, NA), `08/03/2019` = c(15.55526707, 
18.83006014, 11.46177574, 10.64307747, NA, NA), `08/03/2019` = c(16.37396534, 
22.10485321, 13.09917227, 11.46177574, NA, NA), `11/03/2019` = c(112.9803608, 
103.9746799, 98.24379203, 50.75929255, 29.47313761, NA), `11/03/2019` = c(25.37964627, 
24.56094801, 21.28615494, 19.64875841, NA, NA), `11/03/2019` = c(28.65443934, 
22.92355147, 19.64875841, 19.64875841, NA, NA), `18/03/2019` = c(139.1787054, 
130.9917227, 129.3543262, 54.03408562, 34.38532721, NA), `28/03/2019` = c(110.524266, 
115.4364556, 81.86982669, 0, 0, NA)), row.names = c(NA, -6L), class = "data.frame")
UseR10085
  • 7,120
  • 3
  • 24
  • 54
Stijn
  • 51
  • 7

1 Answers1

1

Here a solution using tidyverse:

# library
library(lubridate)
#> 
#> Attache Paket: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(tidyverse)


# make unique colnames
colnames(dfswe) <- make.unique(colnames(dfswe))


# adjust data
 dfswe.p <-dfswe %>%
  gather(date1,value) %>% 
  mutate(date=dmy(substr(date1,1,10))) %>% 
  mutate(week=as.factor(week(date))) %>% 
  group_by(date,date1) %>% 
  mutate(transect=case_when(nchar(date1)==10 ~0, TRUE ~ as.numeric(substr(date1,12,12)))) %>%
  mutate(transect=transect+1) %>% 
  ungroup()

# add missing week
dfswe.p$week <- factor(dfswe.p$week, levels= c(paste(6:13)))

# make plot
ggplot(dfswe.p) + 
  geom_boxplot(position = position_dodge(preserve = "single"),
               aes(x=week, y=value,group=interaction(date,transect),fill=factor(transect))) + 
  labs(title="SWE over time",
       x ="TIME [Week # of 2019]",
       y = "SWE [mm]") +
  scale_x_discrete(drop=FALSE) +
  guides(fill=guide_legend(title="transect"))

Created on 2020-04-12 by the reprex package (v0.3.0)

captcoma
  • 1,768
  • 13
  • 29
  • Thanks for thinking along! I think there might be problem with me using the colnames of SWE as date. Even when setting them as.Date, mutating does yield an error: Error: Column names `2020-02-11`, `2020-02-13`, `2020-02-13`, `2020-02-13`, `2020-02-16`, ... (and 16 more) must not be duplicated. Use .name_repair to specify repair. – Stijn Apr 11 '20 at 17:31
  • Thank you for the df. Some dates appear multiple times in the columns (e.g. 11/03/2019), is this intended? – captcoma Apr 11 '20 at 18:10
  • Unfortunately, yes! As there may be multiple transects measured on a single date, there will be duplicates in terms of dates – Stijn Apr 11 '20 at 18:12
  • how did you treat this multiples in the initial graph? it looks like there was only one boxplot per day? – captcoma Apr 11 '20 at 18:12
  • Do you want to use all measurements of all duplicates for the boxplots? – captcoma Apr 11 '20 at 18:13
  • In the initial graph each column, regardless of date, creates its own boxplot. The plan would've been to include all 'duplicates' of dates, as the different colours symbolize different transects. So on 13/02, 4 transects have been measured, whereas there were only 2 transects measured on 11/02. – Stijn Apr 11 '20 at 18:19
  • That is actually pretty neat! Thanks a dozen! – Stijn Apr 12 '20 at 07:57