1

i am currently trying to make a beautiful geom_col plot on a huge sample size. The names of the samples (which should be on the x-axis) are both numeric and characters, since i include "N" for negative control.

sample_names <- c(100,22,4,5,6,"N")
size <- c(3,2,3,4,2,3)

Now i would like to have that on in a beautiful order ranging from the lowest sample_name (meaning starting with sample number 4, then sample number 5, then sample number 6, sample number 22, sample number 100) to the highest and ending with the N. Since the values in the colum are identified as characters it always starts with sample 100 (because 1-0-0 is before 2-2).

d <- data.frame(sample_names,size) %>%
     arrange(a)

enter image description here

This leads me to the problem that the data in the plot is ordered in a not that nice way. enter image description here

It would be more pleasing to have in in the ascending order with the N at the end.

I already tried to transform this colum into a numeric and replace the resultig NA (which come in place of the "N") with a 0.

The issue with that is, that the plot includes huge gaps between the samples:

d <- data.frame(sample_names,size) %>%
   arrange(a) %>%
   mutate(sample_names = as.numeric(sample_names))%>%
   replace_na(list(sample_names = 0))

enter image description here

So my question is: Do you know how either sort a character colum into the "correct" ascending way OR do you know how to close the gaps on the x-axis in ggplot2? Thank you

CoDa
  • 132
  • 10
  • `ggplot(d, aes(reorder(sample_names, size), size)) + geom_col()` – Ronak Shah Jun 04 '21 at 09:21
  • OKay, i guess i was not clear enough, i meant an ascending order of the sample name in the plot, meaning first 4 then 5,6,22,100,N. like this. I will update my question – CoDa Jun 04 '21 at 09:24
  • 1
    I see. In that case you need to arrange the factor levels on x-axis in the order that you want the bars. `d$sample_names <- factor(d$sample_names, c(4, 5, 6, 22, 100, 'N'))` and then use the `ggplot2` cide. – Ronak Shah Jun 04 '21 at 09:27
  • Thank you, is there a handy version if my data has alot of rows? to do it by hand is in that case not very pleasing^^ – CoDa Jun 04 '21 at 09:29

1 Answers1

2

Order of bars are controlled by factors in the data. To automate the factor generation code you can extract the values which are only numbers with regex, change them to numeric, sort them and append the non-numeric values at the end.

num <- grep('^\\d+$', d$sample_names)

d$sample_names <- factor(d$sample_names, 
                 c(sort(unique(as.numeric(d$sample_names[num]))), 
                        unique(d$sample_names[-num])))

library(ggplot2)

ggplot(d, aes(sample_names, size)) + geom_col()

A simpler approach as suggested by @Rui Barradas is to use stringr::str_sort or gtools::mixedsort -

d$sample_names <- factor(d$sample_names, stringr::str_sort(unique(d$sample_names), numeric = TRUE))

d$sample_names <- factor(d$sample_names, gtools::mixedsort(unique(d$sample_names)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213