2

Is is possible to create something like this in R?

I have 7 different variables that i want to include for product A and the same 7 for the rest of the products, B, C...

However I also want to include the summaries vales (min, mean and max).

boxplot

How can I create this?

I already have all the different variables as a "Value". I was trying with something like protein~product but i want for all variables inside the Product AAA. If possible, the same for all products ( i don't know it that will be possible due to the amount of the variables).

this is a part of the data..

product  protein  fat  moisture ash  fiber  starch  sugar 
  AAA     49      1.0    NA      NA   10     7.4    6.1 
  BBB     35      1.6    NA      NA   10.6   8.5    10.0 
  AVF     40      1.2    NA      NA    6     7.8    6.3

Thank you!

Ana Raquel
  • 155
  • 3
  • 13
  • It would be helpful if you provided your data by using `dput(MyData)` and then pasting the result into your question. – G5W Feb 03 '17 at 14:29
  • @G5W I tried that but the data base is too big. I cannot see the entire answer... – Ana Raquel Feb 03 '17 at 14:32

1 Answers1

2

You can start your adventure with this example. EDIT: I added some info, how to get from your data format to a long data format, required for the plot. Also find more info at similar questions: Plot multiple boxplot in one graph

# simulate the data
set.seed(314)

id <- rep(1:100, each = 3)
prod <- paste("product",rep(letters[1:3], each=300))
ing <- rep(c('protein','fat','starch'), 300)
mg <- rnorm(900, 5, 2)

df <- data.frame(prod, ing, mg, id)

#reconstruct your data format
yourdata <- df %>% group_by(id, prod) %>% spread(ing, mg)


library(ggplot2)
library(dplyr)
library(tidyr)

# get your format in long format
pd <- yourdata %>% gather(ing, mg, -id, -prod)

# use the long format for the plot

ggplot(pd, aes(x = ing, y = mg, fill = ing)) + geom_boxplot() +
  facet_grid(~prod)

enter image description here

Community
  • 1
  • 1
Wietze314
  • 5,942
  • 2
  • 21
  • 40
  • @Wietze134 This seems really nice! Is exactly want I want to do! However, one of my problems is that your "mg" was perfect if the x,mean and sd were equal for all the parameters, right? In my case i need to have the summary of each "ing" per "prod" (analysis per product)... – Ana Raquel Feb 03 '17 at 15:09
  • Your data shows just 3 rows. If there is one row per product there is not much to summarize, is there? – Wietze314 Feb 03 '17 at 15:18
  • @Wietze134 In total I have 140000 rows.. so I really need to summarize.. This case was just an example, otherwise it will be impossible.. My fault, I forgot to say that. – Ana Raquel Feb 03 '17 at 15:30
  • Ok no problem. Because I generated my data, it's all the same, but just an illustration. If you use `pd <- yourdata %>% gather(ing, mg, -product)` on your data then it should act the same. There is a way to customize the barplot to plot min, mean, max instead of the normal thing. Info can be found here: http://docs.ggplot2.org/current/geom_boxplot.html – Wietze314 Feb 03 '17 at 15:31
  • @Wieztze134 I'm not being able to apply your code to my data.. can you help me, please? id<- should it be a column with the row number? prod <- should it be a column with the products? ing <- should it be the names for my analysis?("protein", "starch"..) cause i only have that information as a head of each column.. mg <- should I change the first number to the same amount of rows that i have in my df? Thank you! – Ana Raquel Feb 06 '17 at 09:35
  • Hi Ana, I tried explaining how you can alter your data with `dplyr` and `tidyr` using the function `gather`. If you can give us the ouput of for example `dput(head(YourDataFrame))` I can help you. In the `gather` statement `ing` and `mg` are the names for the new columns. – Wietze314 Feb 06 '17 at 10:58
  • I found out that I needed to melt my data. Thank you a lot anyways! – Ana Raquel Feb 06 '17 at 14:54
  • `melt` and `gather` are similar functions. Good you found a solution. – Wietze314 Feb 06 '17 at 15:42