-2

I am working on text mining project and I have created a sparse matrix in R using tm package. The data is in below mentioned format:

Sample Data format

I want it in the below format: Resultant Data Format

Need help with data wrangling.

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    Welcome to StackOverflow. Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Perhaps the following tips on [asking a good question](http://stackoverflow.com/help/how-to-ask) may also be worth a read. – lmo Dec 07 '16 at 14:25

1 Answers1

0

One idea using dplyr and tidyr,

library(dplyr)
library(tidyr)
df %>% 
 group_by(C1, C2, C3) %>% 
 summarise_each(funs(sum)) %>% 
 gather(word, freq, not:great)

#Source: local data frame [24 x 5]
#Groups: C1, C2 [4]

#      C1     C2    C3  word  freq
#   <dbl> <fctr> <dbl> <chr> <dbl>
#1      1      a     1   not     0
#2      1      a     2   not     1
#3      2      b     3   not     2
#4      2      d     2   not     0
#5      3      c     1   not     1
#6      3      c     2   not     0
#7      1      a     1  cant     1
#8      1      a     2  cant     0
#9      2      b     3  cant     0
#10     2      d     2  cant     0

DATA

dput(df)
structure(list(C1 = c(1, 2, 3, 2, 3, 2, 1), C2 = structure(c(1L, 
2L, 3L, 2L, 3L, 4L, 1L), .Label = c("a", "b", "c", "d"), class = "factor"), 
    C3 = c(2, 3, 2, 3, 1, 2, 1), not = c(1, 1, 0, 1, 1, 0, 0), 
    cant = c(0, 0, 0, 0, 1, 0, 1), able = c(1, 0, 0, 0, 0, 0, 
    0), great = c(0, 0, 0, 0, 0, 1, 1)), .Names = c("C1", "C2", 
"C3", "not", "cant", "able", "great"), row.names = c(NA, -7L), class = "data.frame")
Sotos
  • 51,121
  • 6
  • 32
  • 66