1

I'm getting data from a MySQL table that has 2 columns (idDoc, tag) describing that the document has a given tag. When I use the data frame with

ddply(tags,1)

My objective is to group tags by id, so say I do the following steps

> x=c(1,1,2,2)
> y=c(4,5,6,7)
> data.frame(x,y)
  x y
1 1 4
2 1 5
3 2 6
4 2 7

My desired output would be perhaps a list of lists (or whatever other result) that would get

 1 -> c(4,5)
 2 -> c(6,7)

Regards

tonicebrian
  • 4,715
  • 5
  • 41
  • 65
  • ddply expects three arguments, not two. The first argument is the data.frame you want to summarize (looks like `tags` in your case), the second is the variable(s) you want to summarize by (`idDoc`?), and the third is the function or function you want to apply to each group defined in the second argument. What do you mean by "association"? Check out this question for ideas on how to ask a better question (to get better answers): http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Chase Oct 18 '11 at 16:18
  • The third argument is optional and I thought that if not specified it just groups the elements. – tonicebrian Oct 18 '11 at 16:25
  • The third argument is 'optional' only in the sense that it won't generate a warning or error if omitted. The default value is `NULL`, which means if you omit it, you'll just get your original data back, unchanged. – joran Oct 18 '11 at 16:31
  • I rewrote the example hoping that now is more clear. – tonicebrian Oct 18 '11 at 16:42
  • Given your clarification, I think my answer _may_ achieve what you want. – joran Oct 18 '11 at 16:45

1 Answers1

2

This is kind of a shot in the dark, since when you say you want an 'association', that doesn't really precisely describe any particular R data structure, so it's unclear what form you want the output to take.

But one base R possibility would be to simply use split:

split(tags$tag, tags$idDoc)

which should returned a named list where the names come from idDoc and each list element is the tags associated with that idDoc value. There will be duplicates, though. So maybe this would work better:

tapply(tags$tag,tags$idDoc,FUN = unique)

which should return a list of unique tags for each idDoc.

(Edited: No need for the anonymous function; only need to pass unique).

joran
  • 169,992
  • 32
  • 429
  • 468