0

Possible Duplicate:
Extracting indices for data frame rows that have MAX value for named field

hello,

I have a data frame like this :

   A1 A3    d
1   a pr    5
2   a be    0
3   a cd    8
4   a dy    0
5   b pr    3
6   b be    4
7   b cd    9

etc...

I want to test each row, and get the unique rows based on A1 and have max value of d

the output should be like this

A1 A3 d
a  cd 8
b  cd 9

etc..

The data frame is bigger , but that's an example.

Can this be done with R? without looping and long stuff??

thanks

Community
  • 1
  • 1
weblover
  • 371
  • 2
  • 7
  • 15
  • @Joris Meys : it's a duplicate , but i did not understand the whole method used , i was able to generate all ids of the row that have max value , i wasn't able to get a subset of the original data frame based on this id, how this can be done?? – weblover May 19 '11 at 11:47
  • 2
    first of all, read http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/ thoroughly. Second, if you have the ids and you don't know how to subset (see eg ?subset, ?Extract, ...) , it's very much time to start reading any of http://stats.stackexchange.com/questions/138/resources-for-learning-r – Joris Meys May 19 '11 at 11:51
  • @Joris Meys : thanks for your reply. i know how to extract them , but i needed another way . – weblover May 19 '11 at 12:00

3 Answers3

2

The easiest way to do it is to sort the d column, and them remove duplicates in the A1 column:

df2 <- df[order(df$d,decreasing=T),]
df2[!duplicated(df2$A1),]

This does assume that there is a single unique maximum, you would lose some results if there were more than 1.

James
  • 65,548
  • 14
  • 155
  • 193
  • thanks alot , it worked :) , but i think there is more than 1 maximum value in each column , but i only need 1. – weblover May 19 '11 at 11:59
1

Probably

ddply(dfr, "A1", function(curdfr){curdfr[which.max(curdfr$d),]})
Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57
0

DATA

mydf <- read.table(textConnection("
 Lp   A1 A3    d
 1   a pr    5
 2   a be    0
 3   a cd    8
 4   a dy    0
 5   b pr    3
 6   b be    4
 7   b cd    9"),header=T,row.names="Lp")

CODE

require(data.table)
mydf <- data.table(mydf)
mydf[,.SD[which.max(d)],by=A1]
Wojciech Sobala
  • 7,431
  • 2
  • 21
  • 27