1

My lab is looking to analyze 30 environmental samples containing NGS data. My when I tar'd the tar.bz2 file, the fastq files that were released were out of order in the new directory. I didn't think anything of it and figured the order would be simply be corrected as I progressed through my workflow. This didn't happen. Now in the frequency table my sample columns out of order. Here is an example of what I mean:

sample-B1 sample-B4 sample-B3 sample-B2 species
2 0 4 8 dog
14 3 10 9 cat

I want to change the sample columns to order B1-B4 like so:

sample-B1 sample-B2 sample-B3 sample-B4 species
2 8 4 0 dog
14 9 10 3 cat

This is a simplification. In my actual data I have 30 "sample" columns. Is there an easy way to accomplish this?

Thank you in advance for your time :)

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Dan
  • 9
  • 1

1 Answers1

0

We can use str_sort on the columns

library(dplyr)
library(stringr)
df1 <- df1 %>%
   select(str_sort(names(.), numeric = TRUE))

-output

df1
#  sample-B1 sample-B2 sample-B3 sample-B4 species
#1         2         8         4         0     dog
#2        14         9        10         3     cat

Or another option is mixedsort from gtools

df1 <- df1[gtools::mixedsort(names(df1))]

data

df1 <- structure(list(`sample-B1` = c(2L, 14L), `sample-B4` = c(0L, 
3L), `sample-B3` = c(4L, 10L), `sample-B2` = 8:9, species = c("dog", 
"cat")), class = "data.frame", row.names = c(NA, -2L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for your response akrun! I tried out df1 <- df1[gtools::mixedsort(names(df1))] and that seemed to do the trick! Much appreciated :) – Dan May 01 '21 at 01:18