How can i transform text analytics dataframe from wide to long format in R?

Question

I am trying to analyse some comments using the tm and snowballC packages in R. I have the output in the following format:

 structure(list(Emp.or.man.or.big.idea = c(1, 2, 1, 2, 3), Sentiment = c(0, 
-1, 1, 0, -1), x1st = c(0, 0, 1, 0, 0), x2nd = c(0, 0, 1, 0, 
0), accept = c(0, 0, 0, 1, 0)), .Names = c("Emp.or.man.or.big.idea", 
"Sentiment", "x1st", "x2nd", "accept"), row.names = c(NA, -5L
), class = "data.frame")

My first column is whether the comment is made by manager, employee or it is a big idea. My second column is whether the sentiment is positive, negative or neutral. My 3rd column and onward are specific words and number of mentions for specific employee/manager/big idea and specific sentiment 0/1/-1.

I am trying to understand how to convert the column names to be row names. I am facing two problems when I am trying to use the reshape package but I am not able to pull if off. I have 237 observations and 464 variables so am not sure how to transform the data from column 3 onward so that I can have one unique row for each variable for each unique manager/employee/big idea and each unique sentiment (1/0-1). Same for all other variables from 3 to 464. Simple transpose can't do the trick in this case.

The desired outcome is in this format:

structure(list(Emp.or.man.or.big.idea = c(1, 1, 1, 2, 2, 2, 3, 
3, 3), Sentiment = c(0, -1, 1, 0, -1, 1, 0, -1, 1), words = structure(c(2L, 
2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L), .Label = c("accept", "x1st", 
"x2nd"), class = "factor"), num.mentions = c(2, 1, 3, 4, 2, 3, 
2, 5, 4)), .Names = c("Emp.or.man.or.big.idea", "Sentiment", "words", 
"num.mentions"), row.names = c(NA, -9L), class = "data.frame")

Please don't post pictures of data. Use `dput` instead. Also please include expected result. — Sotos, May 30 '16 at 13:16
[Here are a few ideas](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of how to convey information or ideas. — Roman Luštrik, May 30 '16 at 13:36
Your ouput does not make much sense to me given your input. Where does `num.mentions` come from? is `melt(dt, id.vars = c("Emp.or.man.or.big.idea", "Sentiment"))` what you want? — Chris, May 30 '16 at 18:43
@Chris, thanks it partially help. The `num.mentions` is an example count of variables `x1st` `x2nd` and `accept` although you are right it's redundant and more confusing than helping people understand my problem. Basically the problem with melt is that in my case i need to transpose the variables to rows and be able to count them only if they=1. i.e. if they=0 this means the word was not mentioned for the particular combination of `emp.or.man.or.big.idea` and `Sentiment` so i don't need them as rows. — Martin Petrov, May 30 '16 at 20:19
I am trying `test=recast(new_df,variable>1~Emp.or.man.or.big.idea+Sentiment,id.var=1:2)` but it's not working for me, i want to limit the transposing of the measurement columns to row only if they =1. — Martin Petrov, May 31 '16 at 07:47

How can i transform text analytics dataframe from wide to long format in R?

0 Answers0