0

I have a dataframe with columns describing company names (id), functions (category), indicators (factors) and values for these factors. The purpose is to plot several boxplots to show the distribution of factors values by functions. Data:

structure(list(id = c("Chee, Chelsea", "Chee, Chelsea", "Chee, Chelsea", 
"Chee, Chelsea", "Chee, Chelsea", "Chee, Chelsea", "Chee, Chelsea", 
"Chee, Chelsea", "Chee, Chelsea", "Chee, Chelsea", "Hatchett, Dante", 
"Hatchett, Dante", "Hatchett, Dante", "Hatchett, Dante", "Hatchett, Dante", 
"Hatchett, Dante", "Hatchett, Dante", "Hatchett, Dante", "Hatchett, Dante", 
"Hatchett, Dante", "Hagemeier, Wilmer", "Hagemeier, Wilmer", 
"Hagemeier, Wilmer", "Hagemeier, Wilmer", "Hagemeier, Wilmer", 
"Hagemeier, Wilmer", "Hagemeier, Wilmer", "Hagemeier, Wilmer", 
"Hagemeier, Wilmer", "Hagemeier, Wilmer", "el-Jabour, Suhaa", 
"el-Jabour, Suhaa", "el-Jabour, Suhaa", "el-Jabour, Suhaa", "el-Jabour, Suhaa", 
"el-Jabour, Suhaa", "el-Jabour, Suhaa", "el-Jabour, Suhaa", "el-Jabour, Suhaa", 
"el-Jabour, Suhaa", "Salihi, Divya", "Salihi, Divya", "Salihi, Divya", 
"Salihi, Divya", "Salihi, Divya", "Salihi, Divya", "Salihi, Divya", 
"Salihi, Divya", "Salihi, Divya", "Salihi, Divya", "al-Jamil, Jaad", 
"al-Jamil, Jaad", "al-Jamil, Jaad", "al-Jamil, Jaad", "al-Jamil, Jaad", 
"al-Jamil, Jaad", "al-Jamil, Jaad", "al-Jamil, Jaad", "al-Jamil, Jaad", 
"al-Jamil, Jaad", "Porter, Elijah", "Porter, Elijah", "Porter, Elijah", 
"Porter, Elijah", "Porter, Elijah", "Porter, Elijah", "Porter, Elijah", 
"Porter, Elijah", "Porter, Elijah", "Porter, Elijah", "Ridgley, Matthew", 
"Ridgley, Matthew", "Ridgley, Matthew", "Ridgley, Matthew", "Ridgley, Matthew", 
"Ridgley, Matthew", "Ridgley, Matthew", "Ridgley, Matthew", "Ridgley, Matthew", 
"Ridgley, Matthew", "Oats, Jiair", "Oats, Jiair", "Oats, Jiair", 
"Oats, Jiair", "Oats, Jiair", "Oats, Jiair", "Oats, Jiair", "Oats, Jiair", 
"Oats, Jiair", "Oats, Jiair", "Thompson, Asien", "Thompson, Asien", 
"Thompson, Asien", "Thompson, Asien", "Thompson, Asien", "Thompson, Asien", 
"Thompson, Asien", "Thompson, Asien", "Thompson, Asien", "Thompson, Asien"
), category = c("will", "will", "will", "will", "will", "deal", 
"deal", "deal", "deal", "deal", "will", "will", "will", "will", 
"will", "deal", "deal", "deal", "deal", "deal", "will", "will", 
"will", "will", "will", "deal", "deal", "deal", "deal", "deal", 
"will", "will", "will", "will", "will", "deal", "deal", "deal", 
"deal", "deal", "will", "will", "will", "will", "will", "deal", 
"deal", "deal", "deal", "deal", "will", "will", "will", "will", 
"will", "deal", "deal", "deal", "deal", "deal", "will", "will", 
"will", "will", "will", "deal", "deal", "deal", "deal", "deal", 
"will", "will", "will", "will", "will", "deal", "deal", "deal", 
"deal", "deal", "will", "will", "will", "will", "will", "deal", 
"deal", "deal", "deal", "deal", "will", "will", "will", "will", 
"will", "deal", "deal", "deal", "deal", "deal"), factor = c("f1", 
"f2", "f3", "f4", "f5", "f1", "f2", "f3", "f4", "f5", "f1", "f2", 
"f3", "f4", "f5", "f1", "f2", "f3", "f4", "f5", "f1", "f2", "f3", 
"f4", "f5", "f1", "f2", "f3", "f4", "f5", "f1", "f2", "f3", "f4", 
"f5", "f1", "f2", "f3", "f4", "f5", "f1", "f2", "f3", "f4", "f5", 
"f1", "f2", "f3", "f4", "f5", "f1", "f2", "f3", "f4", "f5", "f1", 
"f2", "f3", "f4", "f5", "f1", "f2", "f3", "f4", "f5", "f1", "f2", 
"f3", "f4", "f5", "f1", "f2", "f3", "f4", "f5", "f1", "f2", "f3", 
"f4", "f5", "f1", "f2", "f3", "f4", "f5", "f1", "f2", "f3", "f4", 
"f5", "f1", "f2", "f3", "f4", "f5", "f1", "f2", "f3", "f4", "f5"
), value = c(0.339243657473717, 0.384596983617986, 0.0903604942291727, 
0.622299975399853, 0.878426613848986, 0.619932561033423, 0.768372484010595, 
1.3720186467304, 0.516137222110122, 0.0939216356224454, 0.423330163104718, 
1.09092813025095, 1.19417177287019, 0.719465669220584, 0.452970378504298, 
-0.262289594598489, 1.22689933746316, 0.816430627598565, 0.225885114542236, 
0.632040744287071, 0.104560237280194, 0.381714309901825, 0.62676961473864, 
-0.0497874636348734, 0.950027143102881, 0.770846095346556, 0.148980694426281, 
0.0441704598142616, 0.490668306336729, 1.02471661138678, 0.156174816905824, 
0.31746617387743, 0.156617889567164, 0.0424322867402526, -0.468906139291209, 
0.240259904852959, 0.477319222715837, 0.838721253256597, 0.445074674905288, 
0.549554109125289, -0.226713556713281, 0.118250559860738, 0.479740692801046, 
0.0787136404239509, -0.796681488556265, 0.191482860752725, 0.28786926088113, 
0.87763251227066, 0.0338514723682836, 0.235576477670443, -0.0690121807547427, 
-0.268401095627916, 0.525430078156439, -0.292741297006626, 0.204765160519623, 
0.332993835314161, 0.410545410766758, 0.686637667590553, 0.149842772573679, 
0.700177571955539, 0.945997668337351, 0.32488054941514, 0.993151127821943, 
0.524358293364559, 0.743356027756573, 0.0247172637782763, 0.205738918048416, 
0.922272051144243, 0.264568168014215, 0.800444985485889, 0.0490291076301935, 
-0.182296829387635, 0.275266536310165, 0.723462807292679, 1.37681045703127, 
0.996572375062412, 0.78567025822639, 0.852269626584109, -0.257367673879751, 
0.998810021760118, 0.90491311313343, 1.33803924723801, 1.44241236118906, 
1.20343139126242, 0.666758519859951, 1.0151075718858, 0.820298727592033, 
1.26452544892297, 0.937448475295236, 0.363135203972494, 0.633056112436769, 
0.965685304671053, 0.640992301458128, -0.083835315236123, 1.14088770490309, 
0.402326393668432, 0.117951239403618, 0.403472929718899, 1.32109715429833, 
0.937023659882023)), class = "data.frame", row.names = c(NA, 
-100L))

I think about automatizing this process. I would like to know how can I:

  1. Filter my dataframe within a function for each will and deal;
  2. To make boxplots for factors within each category.

I tried to write a lambda function but did not understand indexing and how to filter tha abstract dataframe which we define in our function. Conceptually, I understand that I am supposed to do something like that:

plots_fun <- function(dataframe){
  a <- ggplot(data = dataframe[,1], ...)
}

Also, I thought about using lapply... But my first step is to write the function -- actually, what I am struggling with.

In the case of my sample data, the desirable output is two plots - for will and deal:

ggplot(data = sample_data %>% filter(category == "will"),
       aes(factor, value)) +
  geom_boxplot()

ggplot(data = sample_data %>% filter(category == "deal"),
       aes(factor, value)) +
  geom_boxplot()
rg4s
  • 811
  • 5
  • 22
  • I have also done some reserach on question, this answer (https://stackoverflow.com/questions/42025120/how-to-automatize-ggplot-plots-in-a-function-r) does not fit my case, to my mind. – rg4s Feb 10 '21 at 09:02
  • 2
    Have you looked at facets? Could you show what kind of plot you would like to have? – NelsonGon Feb 10 '21 at 09:09
  • 3
    I am not sure this needs automatisation. This really speaks very loudly "facet" to me. – tjebo Feb 10 '21 at 09:10
  • if you really need to produce separate plots, there are plenty of threads out there how to do that . This is a start: https://stackoverflow.com/questions/22309285/how-to-use-a-variable-to-specify-column-name-in-ggplot – tjebo Feb 10 '21 at 09:12
  • 1
    wow, eurica, facets!! thank you, colleagues – rg4s Feb 10 '21 at 09:12
  • @NelsonGon may be, you know possible ways to facet more than 20 categories? The image (plot) looks messy and unreadable... – rg4s Feb 10 '21 at 09:22
  • @tjebo or, may be, you? – rg4s Feb 10 '21 at 09:22
  • 3
    plenty of ways. How messy it looks depends **very** much on your device size - don't trust the viewer panel wherever you're previewing your plot. 0) (and most important): re-consider if you really need to show 20 facets. 1) increase the width and height of final plot. 2) remove fonts and decrease font size as much as possible. 3) play around with facet_wrap or facet_grid, number of columns / rows etc – tjebo Feb 10 '21 at 09:32
  • 1
    Related to @tjebo's comment: https://stackoverflow.com/questions/65816981/how-to-set-adequate-space-for-facet-wrap-in-r – teunbrand Feb 10 '21 at 10:11

0 Answers0