0

I have a tidy data frame of the form

  > data.frame("topic" = c(1,1,2,2,3,3), 
               "term" = c("will", "eat", "go", "fun", "good", "bad"), 
               "score" = c(0.3, 0.2, 0.5, 0.4, 0.1, 0.05))

      topic term score
    1     1 will  0.30
    2     1  eat  0.20
    3     2   go  0.50
    4     2  fun  0.40
    5     3 good  0.10
    6     3  bad  0.05

So the purpose of the table is to store the top n (in this case 2) scoring terms for each topic. This table is easy to work with, but I want to be able to view the data like this:

      topic1  topic2  topic3
   1    will      go    good
   2     eat     fun     bad

In this new table, I don't care about the scores, I just want to see the top n scoring terms for each topic. I feel like this should be doable using dplyr or something but I'm not great with R.

tddevlin
  • 203
  • 3
  • 7
  • For the linked post, just use `dat[, 1:2]` or `dat[, -3]` rather than `dat` as the main argument, this will drop the third column. – lmo Jul 21 '17 at 16:12

1 Answers1

1
library(reshape2)
dcast(df, ave(df$topic, df$topic, FUN = seq_along)~topic, value.var = "term")[,-1]
#     1   2    3
#1 will  go good
#2  eat fun  bad

OR

library(dplyr)
bind_cols(lapply(split(df, df$topic), function(a) a["term"]))
#  term term1 term2
#1 will    go  good
#2  eat   fun   bad
d.b
  • 32,245
  • 6
  • 36
  • 77