0

I want the overlapping part of a histogram to show the bar divided by a diagonal line, for example, I created the following code:

set.seed(1)
grupo1 <- round(rnorm(100, mean = 20, sd = 2.2))
grupo2 <- round(rnorm(100, mean = 10, sd = 2))

df <- data.frame(
  valores = c(grupo1, grupo2),
  grupo = c(rep("grupo1", length(grupo1)), rep("grupo2", length(grupo2)))
)

# Crear histograma
ggplot(df, aes(x = valores, fill = grupo)) +
  geom_histogram(binwidth = 1, color = "black", position = "identity", alpha=0.6) +
  labs(x = "Valores", y = "Frecuencia", fill = "Grupo") +
  scale_fill_manual(values = c("grupo1" = "blue", "grupo2" = "red")) +
  theme_minimal()

This code results in this plot: result, note that the part that overlaps is a different color

But I want the plot to be drawn like this (I modified it in paint): what I want, note that shows both colors Does anyone know how to draw the histogram as the last image shows?

A second example, following the same code:

    df2 <- rbind(df, data.frame(valores = c(15,15), grupo = c("grupo1", "grupo1")))

This is the result: 2° example But i want this (modified in paint): what i want

I have tried some 'geom_histogram' arguments such as changing the 'position' but does not work I expect a code that solves my issue, Thanks in advance

  • 3
    In your example both groups have the same frequency (1) for valores = 15, so the column should have height 1. But what if it weren't the case? If grupo1 had 3 observations with valores = 15, and grupo2 had only 1, what would be the size of the column? – Ricardo Semião e Castro Jun 11 '23 at 19:05
  • Hello @RicardoSemiãoeCastro, I have added two observations to grupo 1 with valores =15, with the following code: valores<-c(15,15) grupo<-c("grupo1", "grupo1") add3<-cbind(valores, grupo) df2<-rbind(df, add3) str(df2) df2$valores <- as.numeric(df2$valores) The same plot shows a column of height 3 in valores=15, the lower part (the first third) of that column is "purple", as in this image: https://i.stack.imgur.com/XzIXB.png and the second and third parts are blue. Hope i made myself understood – Simon Ulloa Rengifo Jun 11 '23 at 19:39
  • Ok, I got it, but now, in that new case (with `df2`), how would the graph with the diagonally splitted bar look? Because the "blue" part of the bar should have height = 3, whereas the red should have height = 1. So how do you want to combine these two bars into a single, diagonally splitted, bar? Maybe edit your question with this new dataframe and create a new image of what you want. – Ricardo Semião e Castro Jun 11 '23 at 20:57
  • Thanks for the answer, I have already edited the question. – Simon Ulloa Rengifo Jun 11 '23 at 23:16
  • Ok, I edited my answer. I believe that it's very similar to what you wanted. – Ricardo Semião e Castro Jun 12 '23 at 01:30

1 Answers1

1

Ggplot2 doesn't work well with this kind of thing. This question is about adding textures to bar plots, something similar to what you want, and it comes at great difficulty.

I've come up with some really weird solutions, there might be a way easier approach that I don't know.

Obs: dummy data (df2) at the end.

Option 1 - bars with combined colors, but not combined heights

This is basically what you said you didn't wanted, but with an improvement: the combined bar doesn't have the sum of the heights. See, if we made:

ggplot(df2, aes(valores, fill = grupo)) +
  geom_histogram(binwidth = 1, color = "black", alpha=0.6)

The bar at valores = 15 will have count = 3 + 1, but we might prefer a count = 3 bar with another samller count = 1 bar below. We can get that using position_dodge(), but requesting no actual dodge:

ggplot(df2, aes(valores, fill = grupo)) +
  geom_histogram(position = position_dodge(0), binwidth = 1, color = "black", alpha=0.6)

enter image description here

But we can also add a small amount of dogdge. Using position_dodge(0.3):

enter image description here

The problem with this solution is that it adds whitespace between the bars. There might be an option to geom_histogram that removes it, but I don't know. You can open a new question if you'd like to.

Option 2 - faking bars using geom_area + geom_segment

You can build a new dataset with the values of the histogram, then you get more flexibility to customize the bars. You could do that with hist(), but since we're using ggplot, I made individual histograms for each group, and got their data with ggplot_build(). There might be a better way to do that, the important part is that at the end you have a dataset with the histogram values for each group.

df_area <- df2 %>%
  group_split(grupo) %>% #for each group
  map_dfr(function(df_group){ #apply the following function
    g <- ggplot(df_group, aes(valores)) +
      geom_histogram(binwidth = 1) #build a histogram
    
    ggplot_build(g)$data[[1]] %>% #get it's data
      select(c(x, xmin, xmax, y)) %>% #select these columns
      mutate(grupo = unique(df_group$grupo)) %>% #and add a 'grupo' column
      pivot_longer(c(xmin, xmax), values_to = "x_area") #pivot the data in order to build columns with geom_area
  })

Now, we can build the area of the histogram with geom_area, and the lines with geom_segment. Again we use position_dodge(0), but this time, no whitespace!:

ggplot(df_area, aes(x_area, y, fill = grupo)) +
  geom_area(position = position_dodge(0), alpha = 0.6, color = "black") +
  geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area))

enter image description here

This might yield these weird boundaries between the area contour and the segments. Also, can't use position_dodge(k) with k != 0.

Option 3 - geom_area + geom_segment with custom data

This is the closest to what you wanted. The ideia is to change the data:

df_area2 <- df_area %>%
  mutate(y = case_when(grupo == "grupo1" ~ ifelse(name == "xmin", y, 0),
                       grupo == "grupo2" ~ ifelse(name == "xmax", y, 0)))

In such a way to produce the inclined bars:

enter image description here

Then, we add that on top of the base graph of the last image:

ggplot(df_area, aes(x_area, y, fill = grupo)) +
  geom_area(position = position_dodge(0), color = "black") +
  geom_area(data = df_area2) + #on top of the base area, but below the lines
  geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area))

enter image description here

Using alpha here will make the inclined bars visible. You can pass "washed out" colors to fill in order to match the previous tone you had.

Lastly, we can add lines at the end of the half bars:

binwidth <- 1

ggplot(df_area, aes(x_area, y, fill = grupo)) +
  geom_area(position = position_dodge(0), color = "black") +
  geom_area(data = df_area2) +
  geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area)) +
  geom_segment(aes(y = y, yend = y, x = x - 0.5*binwidth, xend = x + 0.5*binwidth))

Here, binwidth is the same one you used to create the histogram data.

enter image description here

Dummy data

set.seed(1)
grupo1 <- round(rnorm(100, mean = 20, sd = 2.2))
grupo2 <- round(rnorm(100, mean = 10, sd = 2))

df <- data.frame(valores = c(grupo1, grupo2),
                 grupo = c(rep("grupo1", length(grupo1)), rep("grupo2", length(grupo2))))

df2 <- rbind(df, data.frame(valores = c(15,15), grupo = c("grupo1", "grupo1")))