1

I have a dataframe with too many variables on the x-axis so I would like to introduce breaks in my x-axis labels and change those labels based on another column. I've found a solution here is-it-possible-to-have-a-continuous-line-with-geom-line-across-facets-with-facete which works when I set breaks =1 but when I try to add multiple breaks I get an error:

Below is modified from the linked example.

library(patchwork)
library(ggplot2)
library(scales)

df_graph_data = data.frame(   year = c(
 rep.int("2020", times = 11), 
 rep.int("2021", times = 12), 
 rep.int("2022", times = 3)   ),   month_name = c(
 "Feburary", "March", "April", "May", "June", "July",
 "August", "September", "October", "November", "December",
 "January", "Feburary", "March", "April", "May", "June", "July",
 "August", "September", "October", "November", "December",
 "January", "Feburary", "March"   ),   month_number = c(
 "02", "03", "04", "05", "06", "07",
 "08", "09", "10", "11", "12", "01",
 "02", "03", "04", "05", "06", "07",
 "08", "09", "10", "11", "12", "01",
 "02", "03"   ),   number_of_queries = c(
 484819, 576697, 843015, 925175,
 1102853, 889212, 835706, 774622,
 701338, 850297, 1046064, 1273363,
 958868, 1088284, 1151606, 1666950,
 2025731, 2731704, 2429019, 3228395,
 3204915, 2612807, 2811946, 3053788,
 2589273, 2305433   ) )

df_graph_data$rownum = 1:nrow(df_graph_data)

windows()
graph <- ggplot(df_graph_data) +   geom_line(aes(x = rownum,
y = number_of_queries),   size = 1,   colour = "blue",   linetype =
"solid"   ) +    scale_x_continuous(
 breaks = seq(
   min(df_graph_data$rownum),
   max(df_graph_data$rownum),
   by = 1
 ),
 labels = df_graph_data$month_number   )

graph

This produces this graph

enter image description here

The data set I have is much larger to I would need breaks = 10, but when I try this I get the following error: breaks and labels must have the same length.

I would like to find out if there is a way to introduce breaks based on one column and then change the label based on a corresponding column. So for example if the breaks show rownum 10, 20, 30 then the label should be the month_name that corresponds to that rownum

neilfws
  • 32,751
  • 5
  • 50
  • 63
NosiMsomi
  • 37
  • 4

1 Answers1

1

The idea of breaks and labels is rather straight forward: place label[i] at position breaks[i].

If you want to space your labels further apart, you can use for instance this snippet:

brk_idx <- seq(
   min(df_graph_data$rownum),
   max(df_graph_data$rownum),
   by = 10
)

ggplot(df_graph_data) +   
   geom_line(aes(x = rownum,
                 y = number_of_queries), 
             linewidth = 1, colour = "blue",   
             linetype = "solid") +    
   scale_x_continuous(
      breaks = df_graph_data$rownum[brk_idx],
      labels = df_graph_data$month_number[brk_idx])

Line Plot with Breaks at self defined posiitons

What it basically does is to look up the rows given by brk_idx and take rownum as position and month_number as label at this position:

df_graph_data[brk_idx, c("rownum", "month_number")]
#    rownum month_number
# 1       1           02
# 11     11           12
# 21     21           10

That is place "02" at position 1, "12" at position 11 and "10" at position 21. (N.B. brk_idx and df_graph_data$rownum[brk_idx] are the very same here)

This explains your error by the way, when you canged the by argeument in seq to 10. You wanted to place all month_numbers at positions 1, 11 and 21 so you had 25 labels but only 3 positions.

thothal
  • 16,690
  • 3
  • 36
  • 71