-2

I have data of the following form:

"almond" "blueberry" 3
"almond" "leek" 6
"almond" "citron" 7
"almond" "fish" 2
...
"leek" "swiss_cheese" 3
"leek" "pumpkin" 5
"leek" "onion" 4
"leek" "chocolate" 10
...

For each value in the first column I want to find the k best partners according to the third column. "Best" means: lower number in the third row. Thus, for almond its three best partners are fish, blueberry, leek. For leek, its three best partners are swiss_cheese, onion, and pumpkin. I finally want to reduce the full table to the three best partners for each of the factors in the first column, i.e.

"almond" "blueberry" 3
"almond" "leek" 6
"almond" "fish" 2
...
"leek" "swiss_cheese" 3
"leek" "pumpkin" 5
"leek" "onion" 4
...
networker
  • 117
  • 1
  • 8
  • 1
    I suggest you provide an example data set and the answer you want. Then maybe somebody can provide the R code needed to convert the example data set into the desired answer. – Mark Miller Nov 02 '12 at 23:53
  • For tips on making it easier for people to help you: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Dason Nov 03 '12 at 18:09
  • Hi Dason, I read most of this. Still, I do not see what is lacking to answer this specific example. There is a sample data set and a wanted result. Can you help me finding the weak spot? Obviously, for me, my question is clear ;-) – networker Nov 03 '12 at 18:13
  • Your example data is just some strings on a page. It is not clear they are in a `data.frame` or what kind of object. The first three (and more) answers to the linked FAQ show how to recreate example data sets as part of a question – mnel Nov 05 '12 at 00:29

1 Answers1

0

So, one way I found to do that is

require(plyr)
t <- ddply(table, .(V1), transform, rank = rank(V3))
z <- t[rank <= k,]

The first row loads the package if not already done so. The second adds a last column to table which is called rank and contains the rank of each row according to the third column ("V3"), aggreated by the first column ("V1"). I.e., for each distinct value in the first column, there is a local ranking of all rows containint it. In the second step I can then subset only those rows in which the rank is smaller than the given k.

networker
  • 117
  • 1
  • 8