2

To make it clear, I am looking for a simple way of adding a 90-degree-rotated histogram or density plot whose x-axis aligns with the y-axis of the example plot given below.

library(ggplot2)
library(tibble)

x <- seq(100)
y <- rnorm(100)

my_data <- tibble(x = x, y = y)
ggplot(data = my_data, mapping = aes(x = x, y = y)) +
  geom_line()

Created on 2019-01-28 by the reprex package (v0.2.1)

Ramiro Magno
  • 3,085
  • 15
  • 30

4 Answers4

4

You can try using geom_histogram or geom_density, however it's a little bit complicated as you have to rotate axis for them (while keeping original orientation for geom_line). I would use geom_violin (which is a density plot, but mirrored). If you want to get only one sided violin plot you can use custom geom_flat_violin geom. It was first posted by @David Robinson on his gists.

I used this geom in different answer, however I don't think that it's a duplicate as you need to put it at the end of the plot and combine with different geom.

Final code is:

library(ggplot2)
ggplot(data.frame(x = seq(100), y = rnorm(100))) +
    geom_flat_violin(aes(100, y), color = "red", fill = "red", alpha = 0.5, width = 10) +
    geom_line(aes(x, y))

enter image description here

geom_flat_violin code:

library(dplyr)

"%||%" <- function(a, b) {
  if (!is.null(a)) a else b
}

geom_flat_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
                        position = "dodge", trim = TRUE, scale = "area",
                        show.legend = NA, inherit.aes = TRUE, ...) {
  layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomFlatViolin,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      trim = trim,
      scale = scale,
      ...
    )
  )
}


GeomFlatViolin <-
  ggproto(
    "GeomFlatViolin",
    Geom,
    setup_data = function(data, params) {
      data$width <- data$width %||%
        params$width %||% (resolution(data$x, FALSE) * 0.9)

      # ymin, ymax, xmin, and xmax define the bounding rectangle for each group
      data %>%
        dplyr::group_by(.data = ., group) %>%
        dplyr::mutate(
          .data = .,
          ymin = min(y),
          ymax = max(y),
          xmin = x,
          xmax = x + width / 2
        )
    },

    draw_group = function(data, panel_scales, coord)
    {
      # Find the points for the line to go all the way around
      data <- base::transform(data,
                              xminv = x,
                              xmaxv = x + violinwidth * (xmax - x))

      # Make sure it's sorted properly to draw the outline
      newdata <-
        base::rbind(
          dplyr::arrange(.data = base::transform(data, x = xminv), y),
          dplyr::arrange(.data = base::transform(data, x = xmaxv), -y)
        )

      # Close the polygon: set first and last point the same
      # Needed for coord_polar and such
      newdata <- rbind(newdata, newdata[1,])

      ggplot2:::ggname("geom_flat_violin",
                       GeomPolygon$draw_panel(newdata, panel_scales, coord))
    },

    draw_key = draw_key_polygon,

    default_aes = ggplot2::aes(
      weight = 1,
      colour = "grey20",
      fill = "white",
      size = 0.5,
      alpha = NA,
      linetype = "solid"
    ),

    required_aes = c("x", "y")
  )
pogibas
  • 27,303
  • 19
  • 84
  • 117
  • For having the plots in the same coordinate y-axis [(+1)]. – Ramiro Magno Jan 28 '19 at 14:04
  • @plant is there any way I can improve my answer? – pogibas Jan 28 '19 at 14:08
  • Yes! :D Can you make `geom_flat_violin` function have an argument to allow *left side*, *right side* and *normal* violin plots? Right side mode would be ideal for a right side plot. – Ramiro Magno Jan 28 '19 at 14:11
  • @plant it's not really what you wanted, but I understand the problem. Changed geom to generate right side plot. – pogibas Jan 28 '19 at 14:16
  • although your answer is my preferred one I have to be fair and recognize that jay.sf's answer did answer exactly what I asked... So I will accept his/her answer as the right one. – Ramiro Magno Jan 28 '19 at 14:22
4

I'd try it with either geom_histogram or geom_density, the patchwork library, and dynamically setting limits to match the plots.

Rather than manually setting limits, get the range of y-values, set that as the limits in scale_y_continuous or scale_x_continuous as appropriate, and add some padding with expand_scale. The first plot is the line plot, and the second and third are distribution plots, with the axes flipped. All have the scales set to match.

library(ggplot2)
library(tibble)
library(patchwork)

y_range <- range(my_data$y)

p1 <- ggplot(data = my_data, mapping = aes(x = x, y = y)) +
  geom_line() +
  scale_y_continuous(limits = y_range, expand = expand_scale(mult = 0.1))

p2_hist <- ggplot(my_data, aes(x = y)) +
  geom_histogram(binwidth = 0.2) +
  coord_flip() +
  scale_x_continuous(limits = y_range, expand = expand_scale(mult = 0.1))

p2_dens <- ggplot(my_data, aes(x = y)) +
  geom_density() +
  coord_flip() +
  scale_x_continuous(limits = y_range, expand = expand_scale(mult = 0.1))

patchwork allows you to simply add plots to each other, then add the plot_layout function where you can customize the layout.

p1 + p2_hist + plot_layout(nrow = 1)

p1 + p2_dens + plot_layout(nrow = 1)

I've generally seen these types of plots where the distribution is shown in a "marginal" plot—that is, setup to be secondary to the main (in this case, line) plot. The ggExtra package has a marginal plot, but it only seems to work where the main plot is a scatterplot.

To do this styling manually, I'm setting theme arguments on each plot inline as I pass them to plot_layout. I took off the axis markings from the histogram so its left side is clean, and shrunk the margins on the sides of the two plots that meet. In plot_layout, I'm scaling the widths so the histogram appears more in the margins of the line chart. The same could be done with the density plot.

(p1 +
    theme(plot.margin = margin(r = 0, unit = "pt"))
) + 
  (p2_hist + 
     theme(axis.text.y = element_blank(), 
           axis.ticks.y = element_blank(),
           axis.title.y = element_blank(),
           plot.margin = margin(l = 0, unit = "pt"))
   ) + 
  plot_layout(nrow = 1, widths = c(1, 0.2))

Created on 2019-01-28 by the reprex package (v0.2.1)

camille
  • 16,432
  • 18
  • 38
  • 60
  • damn it! you just posted your answer right after I accepted the other... – Ramiro Magno Jan 28 '19 at 14:24
  • 1
    The most comprehensive answer so far. You nailed it: the concept of marginal plot. That's exactly what I wanted. – Ramiro Magno Jan 28 '19 at 14:25
  • 1
    To be fair to other answerers, it wasn't totally clear from your question that you wanted a marginal plot. I guessed it from the fact that that's how I've generally seen these – camille Jan 28 '19 at 14:27
2

You could use egg::ggarrange(). So basically what you want is this:

p <- ggplot(data=my_data, mapping=aes(x=x, y=y)) +
  geom_line() + ylim(c(-2, 2))
q <- ggplot(data=my_data, mapping=aes(x=y)) +
  geom_histogram(binwidth=.05) + coord_flip() + xlim(c(-2, 2))

egg::ggarrange(p, q, nrow=1)

Result

enter image description here

Data

set.seed(42)
my_data <- data.frame(x=seq(100), rnorm(100))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
-3
my_data1 <- count(my_data, vars=c("y"))
p1 <- ggplot(data = my_data, mapping = aes(x = x, y = y)) + geom_line()
p2 <- ggplot(my_data1,aes(x=freq,y=y))+geom_line()+theme(axis.title.y = element_blank(),axis.text.y = element_blank())
grid.draw(cbind(ggplotGrob(p1), ggplotGrob(p2), size = "last"))

enter image description here

  • 3
    The OP wanted a histogram or density plot to show the distribution of values, not just a line of the included range – camille Jan 28 '19 at 13:52
  • Random data that i generated was having frequency one for all the values thats why its a line. but the above code can generate similar plot posted by others and didn't understand how its different from other answers.. please explain Thanks – Gyan Prakash Mishra Jan 28 '19 at 16:10
  • It's different because a histogram or density plot isn't simply how many times a specific value appears—they have bins across which frequencies are calculated. In this case, you wouldn't necessarily expect, say, 1.4071 exactly to appear more than once, but you would expect more than one observation that's between 1.2 and 1.4, or however your bins are set. It's just the definition of a histogram – camille Jan 28 '19 at 16:17