0

The question at hand is on how to put several boxplots in comparison for juxtaposition, I am looking for solutions as subplots and other ways as subplots within a figure would become too crowded as demonstrated in the replicating data below. A potential thought is to use a dropdown menu that enables the choice of the groups. My boxplot does NOT convert well into plotly and hence the subplot solution as in the thread below does not seem to work. Solutions for the subplot-problem would be very helpful: (Combining Multiple Plots in R Together)

The density plot in this other thread is a nice solution, yet for my purpose there will need to be two more dimensions of time and more variables: (Density plots of four groups in the same plot)

As I continue to work with data across time, I am looking for better ways of visualization that aid the reading of the presented. Any input for my described purpose and also how to improve the current visualization will be highly appreciated.

I have generated a random sample based upon some info from the actual data. My boxplot codes for overall across time is also provided below. The aim is thus to produce the a set of such plots for the 5 nation groups in vecNation.

library(tidyvere)
library(reshape2)

generate a random sample based upon some info from actual data:

1) number of obs based upon years: 50 for each nation but unequal across time;

2) randomize order of nations;

2) range of sampling based upon 25% and 75% quantitles;

`vecYrs <- rep(2011:2015, 50)`
`vecNation <- rep(unique(edPayGSIB$nation), 50) %>% sample()`

3) create a df

`repData <- data.frame(nation = vecNation,
                      yrRep = vecYrs,
                      totalPay = sample(6310:17028,
                                        length(vecYrs),
                                        replace = T),
                      ltipvalue = sample(3880:11076,
                                         length(vecYrs),
                                       replace = T),
                      totalcompensation = sample(976:5752,
                                         length(vecYrs),
                                         replace = T))`

check year dispersion upon randomization

`repData %>%
  filter(nation == 'US') %>%
  count(yrRep)`

1. data format for plot

`repValueYrs_data <- repData %>%
  select(yrRep, totalPay, ltipvalue, totalcompensation) %>%
  reshape2::melt(id.vars = "yrRep")`

2. plot with title, axes, captions

`repValueYrs_plot <- ggplot(repValueYrs_data, 
                           aes(x = as.factor(yrRep),
                               y = value,
                               fill = variable)) +
  geom_boxplot()`

boxplot with 3 variables across time, produced by the codes above

functionalize the above for subgroups

`plotBoxValue <- function(dataWide){

  dataLong <- dataWide %>%
    select(yrRep, totalPay, ltipvalue, totalcompensation) %>%
    ## 1c) into long format
    reshape2::melt(id.vars = "yrRep")

  dataPlot <- ggplot(dataLong, 
                     aes(x = as.factor(yrRep),
                         y = value,
                         fill = variable)) +
    geom_boxplot()
}`

same plots for each subgroup, stored in a list

alphabetical names:

`vecNation <- unique(repData$nation) %>%
  sort()`

empty list to add df as elements

`payValueYrs_plot_List <- list()`

same plot for each nation

`for(i in 1:length(vecNation)){
  dataForPlot <- repData %>%
    filter(nation == vecNation[i])
  payValueYrs_plot_List[[i]] <- plotBoxValue(dataForPlot)
}
`

name list elements

`names(payValueYrs_plot_List) <- vecNation`

conversion into plotly objects do not work well with the above boxplots

`library(plotly)`
`ggplotly(payValueYrs_plot_List[['CH']])`

failed conversion to a plotly object, which is likely what causes the problems in the subplot below

very crowded and difficult to scale if more subgroups are included

`t1 <- subplot(payValueYrs_plot_List[['CH']], 
              payValueYrs_plot_List[['EU']],
              payValueYrs_plot_List[['UK']],
              payValueYrs_plot_List[['US']],
              nrows = 4, margin = .05, 
              ## proportion of subplot height must sum to 1
              heights = c(.25, .25, .25, .25),
              shareX = T) `

subplot: first requiring conversion, then beautifying

density plot solution: requiring dimensions of time and more variables

`t2 <- repData %>%
    ggplot(aes(x = totalPay,
             y= ..count..,
             fill = factor(nation))) +
    geom_density(alpha = 0.6) +
    scale_color_manual(values = c("red", "green","blue","yellow")) +
    theme_bw()`

density plot: considerations of time and more variables would be required

Wei
  • 1
  • 1

0 Answers0