-5

Here's the data frame I'm trying to pivot, or rather, reshape:

  Value            Word list
    1        c("cat", "dog")
    1        c("apple", "banana")
    2        c("cat", "dog")
    2        c("peach", "orange")
    3        c("cat", "dog")
    3        c("berries", "coconut")

Here's the desired outcome (basically just combining elements with the same Value to get one big list by Value):

    Value            Word list
    1        c("cat", "dog", "apple", "banana")
    2        c("cat", "dog", "peach", "orange")
    3        c("cat", "dog", "berries", "coconut")

Thanks in advance to anyone who can offer help (and thank you everyone who have already commented/edited my poor post for me).

To give you an idea why I'm getting a list in my data frame, I'm actually doing a part of speech tagging. After breaking down the comment column with str_split, I got a list in my data frame, because the length of each comment varies. Each comment comes with a score, I need to create a bag of words data frame by the score.

Per your request, > str(df1):

'data.frame':   6 obs. of  2 variables:
 $ Value   : num  1 1 2 2 3 3
 $ Wordlist:List of 6
  ..$ : chr  "cat" "dog"
  ..$ : chr  "apple" "banana"
  ..$ : chr  "cat" "dog"
  ..$ : chr  "peach" "orange"
  ..$ : chr  "cat" "dog"
  ..$ : chr  "berries" "coconut"
  ..- attr(*, "class")= chr "AsIs"

And > dput(df1):

structure(list(Value = c(1, 1, 2, 2, 3, 3), Wordlist = structure(list(
c("cat", "dog"), c("apple", "banana"), c("cat", "dog"), c("peach", 
"orange"), c("cat", "dog"), c("berries", "coconut")), class = "AsIs")), .Names = c("Value", "Wordlist"), row.names = c(NA, -6L), class = "data.frame")
gogolaygo
  • 199
  • 1
  • 12
  • @zx8754 Hi there, 4 down votes because of spelling issues? I'm not even a native English speaker... why so strict? And why can't I thank people in advance? – gogolaygo Feb 24 '16 at 22:25
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you – zx8754 Feb 24 '16 at 22:39
  • 2
    I doubt the downvotes are because of spelling. I tried to make the post more readable, succinct, with no clutter (yes, this includes "thank you", feel free to revert to original). Still after my edit, your post doesn't meet minimum requirement for a good post. Read posts on above comment. – zx8754 Feb 24 '16 at 22:43
  • I find it super intimidating to post questions here @zx8754, especially for new users. Now that my post is downvoted 5 times, more likely than not I'm not going to get an answer here. Does that mean I have to edit it till it's a good question? My guess would be nobody is going to look at this post even if I edited it now that it got 5 downvotes. – gogolaygo Feb 24 '16 at 22:53
  • 1
    Yes, update your post with minimum requirements, and you will most likely to get answers. Users who downvoted cannot upvote until there is any update on your post. – zx8754 Feb 24 '16 at 23:05
  • @rawr Your edit is not the data of the OP. We still are waiting the OP to improve the question. –  Feb 24 '16 at 23:10
  • thank you all for helping here, hope my new edits make more sense now. – gogolaygo Feb 24 '16 at 23:33
  • 1
    it would be really helpful if you could show the results of `str(your_data)` or `dput(your_data)`; data frames containing lists are unusual in R, and the correct answer will really depend on the *actual* structure of the data. @rawr tried to guess. @michaelchirico, how did you set up `df` ... ? – Ben Bolker Feb 25 '16 at 00:08
  • @BenBolker very true. I followed [this](http://stackoverflow.com/questions/9547518/creating-a-data-frame-where-a-column-is-a-list). – MichaelChirico Feb 25 '16 at 03:32
  • I won't upvote yet; your question is no longer worth downvotes, but it's not yet worth upvotes. If you edit the question to show the results of `str()` or `dput()` on your data, I'll upvote. – Ben Bolker Feb 25 '16 at 14:38

3 Answers3

6

Here I used data.table:

library(data.table); setDT(df)

df[, .(word_list = list(unlist(Word.list))), by = Value]
#    Value               word_list
# 1:     1    cat,dog,apple,banana
# 2:     2    cat,dog,peach,orange
# 3:     3 cat,dog,berries,coconut

unlist works recursively to pull all elements of Word.list within each Value into a single vector. We then return these to a list, and finally wrap everything in a named list to create the column (the named list is masked because . is the same for data.table). Could have used list(word_list=...) but I figured the word list has had enough attention for one answer.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
6

Base R solution, using @akrun's data setup:

aggregate(df1$Wordlist,list(df1$Value),unlist,simplify=FALSE)

The other solutions will probably be faster, in case that matters.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
5

We could also use dplyr/tidyr

library(dplyr)
library(tidyr)
unnest(df1, Wordlist) %>% 
            group_by(Value) %>% 
            nest(Wordlist)

data

df1 <- data.frame(Value = c(1, 1, 2, 2, 3, 3),
    Wordlist = I(list(c('cat', 'dog'), c("apple", "banana") , 
 c("cat", "dog") , c("peach", "orange") , c("cat", "dog") , 
 c("berries", "coconut"))))
akrun
  • 874,273
  • 37
  • 540
  • 662