Order data.frame after numbers within character column in R

Question

I have the following data.frame

ID<-c("ID_1","ID_2","ID_3","ID_5","ID_1008","ID_6","ID_10")
SomethingElse<-c(5,6,7,1,2,3,1)
SomeText<-c("Thank","you","for","the","!","help","!")
df<-data.frame(ID,SomethingElse,SomeText)

what i need is to order the data.frame according to the ID column but in regard to the Numbers within it (1,2,3,5,1008,6,10), so that the result looks like:

    ID  SomethingElse    SomeText
   ID_1             5    Thank
   ID_2             6      you
   ID_3             7      for
   ID_5             1      the
   ID_6             3     help
  ID_10             1        !
ID_1008             2        !

My problem is when using the command df[order(df$ID),] it orders the result in lexicographical order which is "wrong" and looks like the following:

     ID SomethingElse SomeText
    ID_1             5    Thank
   ID_10             1        !
 ID_1008             2        !
    ID_2             6      you
    ID_3             7      for
    ID_5             1      the
    ID_6             3     help

Is there any smooth and fast one-liner to solve this issue?

@deset It is probably a good idea to strip off the "ID_" altogether or create a new variable using the inner part of akrun's code. — lmo, Jul 03 '16 at 14:33
@Imo yeah you are right, I thought of that, too, but Im using this IDs all over a much longer code and this is the first step where the order is of importance. I just shouldn't have used the "ID_" part from the beginning... — Deset, Jul 03 '16 at 14:56

akrun · Accepted Answer · 2016-07-03T14:52:35.453

4

We can use sub to remove the non-numeric characters, convert to numeric and order it

df[order(as.numeric(sub("\\D+", "", df$ID))),]
#       ID SomethingElse SomeText
#1    ID_1             5    Thank
#2    ID_2             6      you
#3    ID_3             7      for
#4    ID_5             1      the
#6    ID_6             3     help
#7   ID_10             1        !
#5 ID_1008             2        !

The \\D+ matches one more more non-numeric elements in the 'ID' column and we replace it with ''

sub("\\D+", "", df$ID)
#[1] "1"    "2"    "3"    "5"    "1008" "6"    "10"

edited Jul 03 '16 at 14:52

answered Jul 03 '16 at 14:24

akrun

874,273
37
540
662

It works just great(!) and thanks to your answer I came up with two different solutions one using strsplit and the other substring. Not as smooth and sophisticated as your answer. Its just that Im a bit careful with solutions I don't understand completely (like the regular expression part of the answer). But thats just my personal "quirk". Anyway,Thanks! – Deset Jul 03 '16 at 14:49
@Deset I added some explanations. Hope it helps – akrun Jul 03 '16 at 14:53

Order data.frame after numbers within character column in R

1 Answers1