0

This is my first post, so go easy. Up until now (the past ~5 years?) I've been able to either tweak my R code the right way or find an answer on this or various other sites. Trust me when I say that I've looked for an answer! I have a working script to create the attached boxplot in basic R. https://i.stack.imgur.com/NaATo.jpg

This is fine, but I really just want to "jazz" it up in ggplot, for vain reasons. I've looked at the following questions and they are close, but not complete: Why does a boxplot in ggplot requires axis x and y? How do you draw a boxplot without specifying x axis?

My data is basically like "mtcars" if all the numerical variables were on the same scale. All I want to do is plot each variable on the same boxplot, like the basic R boxplot I made above. My y axis is the same continuous scale (0 to 1) for each box and the x axis simply labels each month plus a yearly average (think all the mtcars values the same on the y axis and the x axis is each vehicle model). Each box of my data represents 75 observations (kind of like if mtcars had 75 different vehicle models), again all the boxes are on the same scale. What am I missing?

Community
  • 1
  • 1
chris
  • 15
  • 2
  • `ggplot` requires data in long format. You need to convert your data to long format with, e.g., `tidyr::gather` or `reshape2::melt`. This will not demo well on `mtcars` since (a) mtcars doesn't have ID variables for the x axis (though we could convert the rownames to a column) and (b) it wouldn't look very nice with some discrete data and almost nothing on the same scale. But if you get your data in long format, your ggplot should be as easy as `ggplot(long_data, aes(x = variable, y = value)) + geom_boxplot()`. – Gregor Thomas Aug 24 '16 at 23:33
  • Basically, if mtcars was 75 vehicle models and each column variable was cylinders for 10 columns. Each column of cylinder was a different year. So it covered 1986 to 1995 year's worth of cylinders. In basic I would just write: – chris Aug 24 '16 at 23:53
  • SORRY---, In basic I would just write something like: boxplot(mtcars$cyl1986, mtcars$cyl1987...) and so on. But I can't for the life of me do this simple boxplot in ggplot or qplot. I know it's because it's a more advanced package, but still. – chris Aug 24 '16 at 23:56

1 Answers1

3

Though I don't think mtcars makes a great example for this, here it is:

First, we make the data (hopefully) more similar to yours by using a column instead of rownames.

mt = mtcars
mt$car = row.names(mtcars)

Then we reshape to long format:

mt_long = reshape2::melt(mt, id.vars = "car")

Then the plot is easy:

library(ggplot2)
ggplot(mt_long, aes(x = variable, y = value)) +
    geom_boxplot()

enter image description here

Using ggplot all but requires data in "long" format rather than "wide" format. If you want something to be mapped to a graphical dimension (x-axis, y-axis, color, shape, etc.), then it should be a column in your data. Luckily, it's usually quite easy to get data in the right format with reshape2::melt or tidyr::gather. I'd recommend reading the Tidy Data paper for more on this topic.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Basically, if mtcars was 75 vehicle models and each column variable was only cylinders for 10 columns' worth. Each column of cylinder was a different year. So it covered 1986 to 1995 year's worth of cylinders. In basic I would just write something like: boxplot(mtcars$cyl1986, mtcars$cyl1987...) and so on. But I can't for the life of me do this simple boxplot in ggplot or qplot. I know it's because it's a more advanced package, but still. I tried this code and got something very different. (can't figure out how to attach it to this comment. Such a noob. – chris Aug 25 '16 at 01:19
  • Code shouldn't go in comments - it's very cramped. What you should do is reproducibly share some of your actual data. Put, say `dput(droplevels(head(your_data, 20)))` in your question. – Gregor Thomas Aug 25 '16 at 01:21
  • That said, if you open a new R session and run the code I show - assuming you have relatively current versions of `ggplot2` and `reshape2`, you should match my output exactly. There's no smoke and mirrors here. I copy/pasted my code and plot into this answer. – Gregor Thomas Aug 25 '16 at 01:22
  • I'm very much out of my league here, so I thank you immensley for your willingness to help me. I may have to just keep my basic R plot, because I simply can't figure this out. I don't know how to paste anything further in this comment section (let the downvotes continue!), but at best I can just describe my dataset. I have a csv file with 13 columns and 75 rows of data. The rows are locations with mercury readings, so 75 different locations. The columns are mercury readings for each month (the first column is just each location name). – chris Aug 25 '16 at 02:12
  • All I want to do is make a prettier version of the boxplot link I posted in my original question. A ggplot boxplot with each month's mercury observations on the x-axis. The y axis is simply the mercury value. Each box in the overall boxplot is a boxplot of each month's 75 mercury observations. Again, Gregor, thank you so much, but I may just have to chalk this up to a learning experience. – chris Aug 25 '16 at 02:14
  • I'd love to help you. You said your data is like `mtcars` - I showed you how to do it with `mtcars`. You said it didn't work but didn't give any more details.. If you post your data, I can demo on your data, but unless you do that I can't do anything more. `ggplot2` is particular about data formats. *You will have to reshaped your data to effectively use it*. – Gregor Thomas Aug 25 '16 at 03:43
  • You're very kind and you are correct. Admittedly, I shouldn't have said my data was like mtcars, though I did qualify how it's different. My apologies. I was able to get your script to work, but at the same time I don't know how to apply those changes to my data. Mostly because I said use mtcars and I shouldn't have! My data is closer to the "nottem" dataset. However, all I want to plot are the months, with their value range on the y-axis. Just a boxplot with 12 boxes showing the range of each month for the x-axis and the range value on the y. Does that help? – chris Aug 25 '16 at 12:31
  • In addition: tweaking the code you pasted I can get everything up but the actual boxes in the boxplot. I get the background grid, the proper value range on the y-axis and the proper labels on the x-axis...but the whole thing is blank. Ggplot is beautiful but it sure isn't intuitive. I may just have to get a tutorial somewhere as I don't want to keep bothering you over something that is seemingly so basic. – chris Aug 25 '16 at 13:01
  • (a) You should definitely read other tutorials as well. (b) Data types matter. Posting `dput(droplevels(head(your_data, 20)))` would let me know *exactly* what your data is like. `nottem` is a strange data set - it's a `ts` object, not a matrix or a `data.frame`. Is this what your data is like, or do you have a data frame? If you can't share your data, can you at least simulate data that has the same structure? [See here for many tips on making reproducible examples](http://stackoverflow.com/q/5963269/903061). – Gregor Thomas Aug 25 '16 at 15:01
  • 1
    Gregor, I want to thank you for helping me first and very fast last night. I've got everything working well now as of this morning and it's because of you. You rock! – chris Aug 25 '16 at 15:17