0

Say I have:

#Continuous variable 1
x1<-rnorm(15, 1, .5)
#Continuous Variable 2
x2<-rnorm(15,1,.5)
#Sample Names
s.names<-c("S3","S5","S8","S14","S11","S13","S15","S12","S10","S2","S1","S6","S7","S4","S9")

df.temp<-data.frame(s.names,x1,x2)

df.temp
   s.names        x1        x2
1       S3 0.7025583 1.6616103
2       S5 0.4401055 1.5715047
3       S8 1.3691886 0.7754010
4      S14 1.1365712 1.2697196
5      S11 2.1193612 0.5968068
6      S13 0.6834145 1.4669863
7      S15 0.7050808 1.3287179
8      S12 2.0293910 0.7502497
9      S10 0.6807918 1.0793561
10      S2 0.6809873 0.7454851
11      S1 0.3775086 0.3150030
12      S6 2.1235465 1.4864190
13      S7 1.1657259 1.3279573
14      S4 1.4629794 0.6146412
15      S9 0.6916639 0.4507309

Now let us try and order.

df.temp[order(df.temp$s.names),]
   s.names        x1        x2
11      S1 0.3775086 0.3150030
9      S10 0.6807918 1.0793561
5      S11 2.1193612 0.5968068
8      S12 2.0293910 0.7502497
6      S13 0.6834145 1.4669863
4      S14 1.1365712 1.2697196
7      S15 0.7050808 1.3287179
10      S2 0.6809873 0.7454851
1       S3 0.7025583 1.6616103
14      S4 1.4629794 0.6146412
2       S5 0.4401055 1.5715047
12      S6 2.1235465 1.4864190
13      S7 1.1657259 1.3279573
3       S8 1.3691886 0.7754010
15      S9 0.6916639 0.4507309

But my issue is I have trouble now manipulating the data frame. In particular, when I try and order or sort the s.names it always returns something along the lines of >S1,S10,S11,S12...,S2,S20,S21,S3,S4,S5,S6,S7 etc. (not 21 samples but see above example.) The reason is, of course, I am trying to re-arrange the data frame by rows. order() and sort() have had issues with this.

Additionally, I am wondering, if I want to "bootstrap" or randomly change the rows around for statistical reasonssamples linked to each other as in S1 will have a corresponding x1 and x2 value, it will just be in a different, perhaps random order e.g. S5,S11,S6, etc.

My end goal is to do regression such as ANOVA(), cov() and cor()

EDIT: Added more code

Molx
  • 6,816
  • 2
  • 31
  • 47
Sean
  • 641
  • 1
  • 10
  • 24
  • Don't use `cbind`; use `data.frame`, like `data.frame(s.names,x1,x1)`. `cbind` makes a matrix and matrices all have the same data type (character in this case, which is sorted lexicographically). – Frank Jul 27 '15 at 00:19
  • s.names is a vector. You can check this by using is.vector(s.names) Additionally, I not as interesting in sorting by alphabetical order, but more so again interesting in re-arranging the ROWS of the data frame for analysis. Read the full paragraph. The issue regarding numerical ordering/sorting is occuring (I can easily get rid of the "S" if I wanted to by using the substr() function) – Sean Jul 27 '15 at 00:36
  • Still get the same issue above as when used the link from alexforrence – Sean Jul 27 '15 at 00:44
  • Based on the edits, http://stackoverflow.com/questions/17531403/how-to-sort-a-character-vector-where-elements-contain-letters-and-numbers-in-r might be closer. – alexforrence Jul 27 '15 at 02:29

3 Answers3

1

Your problem is caused because you're trying to sort by a string column as if it was a numeric column. If all the elements begin with S, you can just make them numeric:

> x <- paste0("S", 1:20)
> x
 [1] "S1"  "S2"  "S3"  "S4"  "S5"  "S6"  "S7"  "S8"  "S9"  "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18"
[19] "S19" "S20"
> sort(x)
 [1] "S1"  "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18" "S19" "S2"  "S20" "S3"  "S4"  "S5"  "S6"  "S7" 
[19] "S8"  "S9" 
> x2 <- sort(x)
> x2 <- as.numeric(gsub("[^0-9]", "", x2))
> sort(x2)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

If you don't want to remove the leading S, you can use order() on the extracted numbers like this:

> x[order(as.numeric(gsub("[^0-9]", "", x)))]

Or in this example

> x[order(x2)]

Both result in

 [1] "S1"  "S2"  "S3"  "S4"  "S5"  "S6"  "S7"  "S8"  "S9"  "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18"
[19] "S19" "S20"

Your Additionally isn't very clear, but if it's a different problem you should ask a new question.

Molx
  • 6,816
  • 2
  • 31
  • 47
0

Same approach as @Molx, this time with a different function:

df.temp[order(as.numeric(substr(df.temp$s.names,2,3))),]

With your data should give you what you want. The problem is you're trying to sort strings, and they will do in alphabetical (not numerical) order.

PavoDive
  • 6,322
  • 2
  • 29
  • 55
0

gtools package:

library(gtools)
df.temp[mixedorder(df.temp$s.names), ]

Another base alternative:

n <- df.temp$s.names[order(as.numeric((gsub("S", "", df.temp$s.names))))] 
df.temp[match(n, df.temp$s.names), ]

Output:

   s.names         x1          x2
11      S1  1.2285667  1.48669700
10      S2  0.9438498  0.01775496
1       S3  1.3671933  1.66880402
14      S4  0.7718479  1.53751408
2       S5  0.6023717  0.94600954
12      S6 -0.1341811  1.17744773
13      S7  1.1150349 -0.24347135
3       S8  0.3934848  0.90117148
15      S9  1.7059979  1.64684407
9      S10  0.7533375  1.05615732
5      S11  0.6980853  0.46164739
8      S12  0.3826094  1.26324581
6      S13  0.9616772  1.58527306
4      S14 -0.1876272  1.05792541
7      S15  1.4213483  0.96066296

sqldf package:

library(sqldf)
sqldf("SELECT *, 
      ltrim([s.names],'S') AS n
      FROM [df.temp] ORDER BY n*1")

Output:

   s.names         x1          x2  n
1       S1  1.2285667  1.48669700  1
2       S2  0.9438498  0.01775496  2
3       S3  1.3671933  1.66880402  3
4       S4  0.7718479  1.53751408  4
5       S5  0.6023717  0.94600954  5
6       S6 -0.1341811  1.17744773  6
7       S7  1.1150349 -0.24347135  7
8       S8  0.3934848  0.90117148  8
9       S9  1.7059979  1.64684407  9
10     S10  0.7533375  1.05615732 10
11     S11  0.6980853  0.46164739 11
12     S12  0.3826094  1.26324581 12
13     S13  0.9616772  1.58527306 13
14     S14 -0.1876272  1.05792541 14
15     S15  1.4213483  0.96066296 15
mpalanco
  • 12,960
  • 2
  • 59
  • 67