2

Possible Duplicate:
Select only the first rows for each unique value of a column in R

I have a matrix of the following form:

col1 col2

1    2   
1    2    
1    2   
1    2   
1    2
2    5
2    5
2    5
3    7
3    7
3    7
3    7
3    7
3    7
3    7
3    7
4    2 
4    2 
4    2

I would like to select all the unique rows based on 'col1'.

which in this case would be the first row from each unique value in col1:

subset:

col1   col2
1      2 
2      5
3      7
4      2

Here's what I've tried:

https://dl.dropbox.com/u/22681355/matrix.csv
mat<-read.csv("matrix.csv")
sub<-unique(mat$V1)
subset(mat, mat==c(sub)

It spits out much more than I would expect to get and I get this error mesage:

Warning message: In contacts$V1 == c(g) : longer object length is not a multiple of shorter object length

Community
  • 1
  • 1
user1723765
  • 6,179
  • 18
  • 57
  • 85
  • Since this is going to be asked anyway, I'll go first: What have you tried? Is there a particular piece of code you're having trouble with? – J. Steen Dec 12 '12 at 15:59

1 Answers1

20

You can use the unique function:

unique(mat$V1) # and not matrix$v1
[1]   44  281 1312

You can also write

unique(mat)

and it will give you unique lines (I tried it on your file).

If you want to select on V1s values, you can do this:

> mat[!duplicated(mat$V1), ]
       X   V1 V2 V3 V4  V5 V6 V7 V8 V9 V10
1   1547   44 14  1  2 100 17  0  0  0   0
23  5385  281 67  2 10 100 10  0  0  0   0
33 17347 1312  1  2  6 100  8  0  0  0   0
alestanis
  • 21,519
  • 4
  • 48
  • 67