0

I encountered some strange behavior with the order() function. It was 2 datasets (training and testing), from this code:

train.part <- 0.25
train.ind <- sample.int(n=nrow(newdata), size=floor(train.part*nrow(newdata)), replace=FALSE)
train.set <- newdata[train.ind,]
test.set <- newdata[-train.ind,]

When I try to order train.set by:

train.set <- newdata[train.ind,]

It's all right, but with the second dataset, it's not good: before sorting:

> test.set
     noise.Y noise.Rec
1   7.226370  86.23327
2   3.965446  85.24321
3   5.896981  84.70086
4   4.101038  85.51946
5   7.965455  85.46091
6   8.329555  86.83667
8   6.579297  85.59717
9   7.392187  85.51699
10  5.878640  86.95244
...

after sorting:

    > test.set<-test.set[order(noise.Y),]
    > test.set
            noise.Y noise.Rec
    2      3.965446  85.24321
    4      4.101038  85.51946
    11     7.109978  87.44713
 ...
    NA           NA        NA
    NA.1         NA        NA
    50    17.009351  92.36286
    NA.2         NA        NA
    48    15.452493  92.09277
    53    16.514639  91.57661
    NA.3         NA        NA
...

It was not properly sorting and lot of unexpected NA's.

What's the reason? Thanks!

Scheff's Cat
  • 19,528
  • 6
  • 28
  • 56
Aggle
  • 31
  • 2

1 Answers1

0

Works with me.

test.set <- test.set[order(test.set$noise.Y),]
    noise.Y noise.Rec
2  3.965446  85.24321
4  4.101038  85.51946
10 5.878640  86.95244
3  5.896981  84.70086
8  6.579297  85.59717
1  7.226370  86.23327
9  7.392187  85.51699
5  7.965455  85.46091
6  8.329555  86.83667

Note that if you want the rownames to be consecutive after sorting you can simply do

row.names(test.set) <- NULL
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66