2

I have a vector of chromosome names

q<-c("1","10","11","12","13","14","15","16","17",
     "18","19","20","21","22","2","3","4","5","6",
     "7","8","9","X","Y","M")

I want to sort them as

q<-c("1","2","3","4","5","6","7","8","9","10","11",
     "12","13","14","15","16","17","18","19","20",
     "21","22","X","Y","M")

I tried to make my own order

chrOrder <-c((1:22),"X","Y","M")

and use it like

factor(cbind(q),levels=chrOrder)

But still I couldnt get it.

Edited..... I have similar senario but sligtly advanced. I have a data frame of three columns , name, chromosome, start

df <-data.frame(name =c("a","a","a","b","b","b"), chrom = c(1,2,10,1,3,"X"), start=c(100,200,300,500,300,200))

I need to sort it first by name, then chromosome and the start. The result should be like

name chrom start
a     1   100
a     10  300
a     2   200
b     1   500
b     3   300
b     X   200

I dont know how to use chrOrder in following:

indata  <- df[do.call(order,df[,c(name, chrom, start)]),];
user1631306
  • 4,350
  • 8
  • 39
  • 74
  • 1
    Why not use `chrOrder` directly? Why do you expect that `factor` sorts your vector? Likewise, `cbind` has no effect here. – Konrad Rudolph Sep 25 '13 at 20:29
  • I'm confused by the edited question's desired result. Is it a mistake that the third row is not before the second row? – blakeoft Sep 26 '14 at 14:42
  • Its sorted by first "name", then "chrom". where the sorting of number is not in natural way. Its like 1,10,100,2,200,22,299,300 – user1631306 Sep 30 '14 at 05:20
  • 1
    Related post: https://stackoverflow.com/q/12806128/680068 – zx8754 Aug 01 '19 at 09:32

2 Answers2

3

Your approach is good; you just need to sort the resulting factor. You should also set ordered=TRUE:

sort(factor(q,levels=chrOrder, ordered=TRUE))

No, you don't have to use an ordered factor, as has been pointed out, but it's certainly not wrong--and it's arguably better. Factors are for this type of situation, where you have well-defined levels. See this previous question on on factor vs character.

Now that you've edited your question, the case for a factor is even stronger because sorting is simple:

df <- data.frame(name=c("a","a","a","b","b","b"),
                 chrom = c(1,2,10,1,3,"X"),
                 start=c(100,200,300,500,300,200))

chrOrder <-c((1:22),"X","Y","M")
df$chrom <- factor(df$chrom, chrOrder, ordered=TRUE)

df[do.call(order, df[, c("name", "chrom", "start")]), ]

Given the levels of the factor, R knows exactly how to sort the elements.

I've followed your lead with the sorting method, but you might like to know that there are prettier ways, e.g.:

library(plyr)
df <- arrange(df, name, chrom, start)
Community
  • 1
  • 1
Peyton
  • 7,266
  • 2
  • 29
  • 29
3

factor and cbind don’t do anything here (well, factor does, but it’s not immediately useful).

In your specific case, just saying q <- chrOrder solves the problem, doesn’t it?

More generally, you can use match to get the indices of items in a vector x ordered by the items in another vector y:

> match(chrOrder, q)
 [1]  1 15 16 17 18 19 20 21 22  2  3  4  5  6  7  8  9 10 11 12 13 14 23 24 25

Now you can use those indices to index into q and get it ordered:

> q[match(chrOrder, q)]
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "X"  "Y"  "M"

… so this is the general approach. For instance, as a more useful example: assuming that you actually have a data.frame of genes with a chr column, you could order the rows of the data frame as follows:

> # Some test data
> df <- data.frame(chr = q, value = rnbinom(length(q), 1, 0.01),
+                  row.names = paste('gene', seq_along(q)))
> df <- df[match(chrOrder, df$chr), ]
> head(df)
        chr value
gene 1    1   270
gene 15   2    51
gene 16   3   115
gene 17   4    15
gene 18   5   196
gene 19   6    34

… the data frame columns are now ordered by its chr column in the order you desired.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214