0

I have the following data.frame

ID<-c("ID_1","ID_2","ID_3","ID_5","ID_1008","ID_6","ID_10")
SomethingElse<-c(5,6,7,1,2,3,1)
SomeText<-c("Thank","you","for","the","!","help","!")
df<-data.frame(ID,SomethingElse,SomeText)

what i need is to order the data.frame according to the ID column but in regard to the Numbers within it (1,2,3,5,1008,6,10), so that the result looks like:

    ID  SomethingElse    SomeText
   ID_1             5    Thank
   ID_2             6      you
   ID_3             7      for
   ID_5             1      the
   ID_6             3     help
  ID_10             1        !
ID_1008             2        !

My problem is when using the command df[order(df$ID),] it orders the result in lexicographical order which is "wrong" and looks like the following:

     ID SomethingElse SomeText
    ID_1             5    Thank
   ID_10             1        !
 ID_1008             2        !
    ID_2             6      you
    ID_3             7      for
    ID_5             1      the
    ID_6             3     help

Is there any smooth and fast one-liner to solve this issue?

Deset
  • 877
  • 13
  • 19
  • @user2100721 the `mixedorder` will not work on this. – akrun Jul 03 '16 at 14:28
  • @deset It is probably a good idea to strip off the "ID_" altogether or create a new variable using the inner part of akrun's code. – lmo Jul 03 '16 at 14:33
  • @Imo yeah you are right, I thought of that, too, but Im using this IDs all over a much longer code and this is the first step where the order is of importance. I just shouldn't have used the "ID_" part from the beginning... – Deset Jul 03 '16 at 14:56

1 Answers1

4

We can use sub to remove the non-numeric characters, convert to numeric and order it

df[order(as.numeric(sub("\\D+", "", df$ID))),]
#       ID SomethingElse SomeText
#1    ID_1             5    Thank
#2    ID_2             6      you
#3    ID_3             7      for
#4    ID_5             1      the
#6    ID_6             3     help
#7   ID_10             1        !
#5 ID_1008             2        !

The \\D+ matches one more more non-numeric elements in the 'ID' column and we replace it with ''

sub("\\D+", "", df$ID)
#[1] "1"    "2"    "3"    "5"    "1008" "6"    "10"  
akrun
  • 874,273
  • 37
  • 540
  • 662
  • It works just great(!) and thanks to your answer I came up with two different solutions one using strsplit and the other substring. Not as smooth and sophisticated as your answer. Its just that Im a bit careful with solutions I don't understand completely (like the regular expression part of the answer). But thats just my personal "quirk". Anyway,Thanks! – Deset Jul 03 '16 at 14:49
  • @Deset I added some explanations. Hope it helps – akrun Jul 03 '16 at 14:53