0

I have a data frame which extracts a message thread posted on a discussion forum. By joining tables from a database, I get a structure which looks like this:

threadStarterName1    threadstarter1    comment1    commenterName1
threadStarterName1    threadstarter1    comment2    commenterName2
threadStarterName1    threadstarter1    comment3    commenterName3
threadStarterName1    threadstarter1    comment4    commenterName4
threadStarterName1    threadstarter1    comment5    commenterName5

Code to create this dataframe:

      df=data.frame("threadStarterName"=c("threadStarterName1","threadStarterName1","threadStarterName1","threadStarterName1","threadStarterName1"),
"threadStarter"=c("threadStarter1","threadStarter1","threadStarter1","threadStarter1","threadStarter1"),
"comment"=c("comment1","comment2","comment3","comment4","comment5"),
"commenterName"=c("commenterName1","commenterName2","commenterName3","commenterName4","commenterName5"))

I want to reformat this data frame to extract values as follows, which I can then print out in R-markdown for a report:

threadstarter1    threadStarterName1
   comment1       commenterName1
   comment2       commenterName2
   comment3       commenterName3
   comment4       commenterName4
   comment5       commenterName5

Thanks in advance!

  • 1
    Can you post your code so far? – GrandMasterFlush Oct 03 '16 at 16:19
  • 1
    http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Please start with a reproducible example. – Brandon Bertelsen Oct 03 '16 at 16:21
  • Some specific things that are unclear in this post (that using `dput()` or other recommendations from Brandon's link would fix): are `threadstarter` and `message1` the same column or different columns? Are `row1 row2`... `row.names` attributes or another column? What classes are your columns? Does this need to be generalized to multiple messages, or does the data frame only contain `message1`? And also, what have you tried? Where did it fail? How did you get stuck? – Gregor Thomas Oct 03 '16 at 16:25
  • threadstarter message1 is 1 column and comment1 is 1 column -apologies for the confusion. I just realized it. it's basically 1 threadstarter column and 1 comment column. And yes, it must generalize to other threadstarters or message2 etc. So far, I was thinking along the lines of extracting unique(df$threadstarter) and then match it to df$comment. but this fails. – avkrishnan Oct 03 '16 at 18:06
  • I'm a new user, and so apologize for incomplete posts. I now added a line of code to create the dataframe in question. As for getting the output I want, I'm not sure where to start. – avkrishnan Oct 03 '16 at 18:20

1 Answers1

0

If I understand correctly, the original thread post (and its author) are repeated on each row, and instead you want them to be only present once, and in the same columns as the comment content and the comment authors.

If so, this should do:

onlyOnce <-
  data.frame(
    user = c(df$threadStarterName[1]
             , df$commenterName)
    , commentPosted = c(df$threadStarter[1]
                        , df$comment)
  )

It takes the first thread author entry (and their post) and puts it at the top above the comment authors (and their comments).

Mark Peterson
  • 9,370
  • 2
  • 25
  • 48