22

Possible Duplicate:
How to sort a dataframe by column(s) in R

I was just wondering if some one could help me out, I have what I thought should be a easy problem to solve.

I have the table below:

SampleID           Cluster

R0132F041p          1

R0132F127           1

R0132F064           1

R0132F068p          1

R0132F015           2

R0132F094           3

R0132F105           1

R0132F013           2

R0132F114           1

R0132F014           2

R0132F039p          3

R0132F137           1

R0132F059           1

R0132F138p          2

R0132F038p          2

and I would like to sort/order it by Cluster to get the results as below:

SampleID    Cluster

R0132F041p  1

R0132F127   1

R0132F064   1

R0132F068p  1

R0132F105   1

R0132F114   1

R0132F137   1

R0132F059   1

R0132F015   2

R0132F013   2

R0132F014   2

R0132F138p  2

R0132F038p  2

R0132F094   3

R0132F039p  3

I have tried the following R code:

data<-read.table('Table.txt', header=TRUE,row.names=1,sep='\t')

data <- data.frame(data)
data <- data[order(data$Cluster),]
write.table(data, file = 'OrderedTable.txt', append = TRUE,quote=FALSE, sep = '\t', na ='NA', dec = '.', row.names = TRUE, col.names = FALSE)

and get the following output:

1   1

2   1

3   1

4   1

5   1

6   1

7   1

8   1

9   2

10  2

11  2

12  2

13  2

14  3

15  3

Why have the SampleIDs been replaced by the numbers 1-15 and what do these numbers represent, I have read the ?order() page however this seems to explain sort.list better than order() if any one could help me out on this I would be very grateful.

Community
  • 1
  • 1
sinead
  • 269
  • 1
  • 4
  • 7

2 Answers2

12

The short answer is you did it perfectly. You just are having some difficulty with reading and writing files. Going through your code:

data<-read.table('Table.txt', header=TRUE,row.names=1,sep='\t')

The above line is reading in your data fine, but the row.names=1 told it to use the first column as names for rows. So now your SampleIDs are row names instead of being their own column. If you type data or head(data) or str(data) immediately after running this line, this should be clear. Just omit that row.names argument and it will read properly.

data <- data.frame(data)

You don't need this above line because read.table() produces a dataframe. You can see that with str(data) as well.

data <- data[order(data$Cluster),]

The above line is perfect.

write.table(data, file = 'OrderedTable.txt', append = TRUE,
   quote=FALSE, sep = '\t', na ='NA', dec = '.', row.names = TRUE, 
   col.names = FALSE)

Here you included the argument col.names = FALSE which is why your file doesn't have column names. You also don't need/want append=TRUE. If you look at help(write.table), you see it is "only relevant if file is a character string". Here it seems to make the file write without ending the last line, which would likely cause any later read.table() to complain.

The numbers 1-15 in your result look like row numbers. You don't explain how you look at the resulting file, so I cannot be sure. You likely read your file in a way that doesn't parse the row.names and is showing row numbers instead. If you make certain your SampleIDs column does not get assigned to be names of rows, you'll probably be fine.

MattBagg
  • 10,268
  • 3
  • 40
  • 47
  • Thank you so much, Thats working perfectly now, and was explained really well, you are a star. – sinead Nov 15 '12 at 13:58
5

Have a look at the arrange function of the plyr package.

arrange(data, Cluster)
write.table(data, "ordered_data.txt")
Markus
  • 347
  • 1
  • 3
  • 14