1

I have the following plot but the x-axis labels are too small to read. If I increase the size they become overlapped and illegible

X axis labels too small

I have tried the code form the following link

But I get an error 'wrong sign in 'by' argument'

I also tried the following link How to show every second R ggplot2 x-axis label value?

but as a beginner I couldn't quite follow the code and aren't sure if there is an added complication with the labels being dates.

I have a df of 100 genes across 170 samples + a 'Genes' column. I then use the following code to make the data long:

mat_dataNew<-mat_data %>% gather(sample, expression, -Genes)
#log10-transform Counts data
mat_dataNew <- mutate(mat_dataNew, log10_NormCounts = log10(expression))

I then plot with:

ggplot(mat_dataNew, aes(x=reorder(Genes, -log10_NormCounts), y=log10_NormCounts, colour=Group)) +
 geom_point(size=.5) +
theme(axis.text.x=element_text(angle=90, size=4)) +
 labs(x = "100 most abundant Genes", y = "Gene Expression (Log10 Normalised Counts)")

I have tried reading about scale_x_continuous but couldn't seem to find an answer. Could you please help to suggest some code so that the x axis label is legible? I thought maybe the easiest option would be to display every other label???

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
zoe
  • 301
  • 3
  • 11
  • 3
    How about just using coor_flip() to put text on y-axis and making it a bit more readable? https://ggplot2.tidyverse.org/reference/coord_flip.html – Anakin Skywalker Mar 06 '19 at 23:21
  • 2
    Do you really need to show 100 genes? From the plot you showed, looks like a lot of them are pretty similar – Tung Mar 06 '19 at 23:37
  • And you want `scale_x_discrete` not `scale_x_continuous` – Tung Mar 06 '19 at 23:38
  • fyi you can use `fct_lump()` to lump similar genes in to a new “other” level https://r4ds.had.co.nz/factors.html#modifying-factor-levels – Tung Mar 06 '19 at 23:48

1 Answers1

3

You need scale_x_discrete as was said in the comments by Tung.

scale_x_discrete has the following three arguments (among others, documentation):

  1. limits - decides the values and their orders for the graph
  2. breaks - decides which values to display as ticks on the axis
  3. labels - decides what the ticks actually display as their labels

(In your case, since your gene names are already factors, you do not need the labels argument.)

Set the limits argument to all of the genes, set the breaks argument to every other gene (SO ref), and add the scale_x_discrete to your ggplot.

ggplot(...+
scale_x_discrete(limits=Genes,breaks=Genes[seq(1,length(Genes),by=2)])+
...)

Note: you seem to be reordering your Genes column in the ggplot you will need to do the same reordering for the limits argument for this to work.