R: Count the Occurrence of words in list to create benchmark

Question

I have list that consist of words:

$text
$text[[1]]
 [1] "qlikview" "gpa"      "access"   "gpa"      "access"   "access"   "qlikview" "gpa"      "access"  
[10] "gpa"     

$text[[2]]
 [1] "report"   "qlikview" "gpa"      "access"   "qlikview" "gpa"      "access"   "qlikview" "gpa"     
[10] "access"`  

$text[[3]]
 [1] "qlikview" "gpa"      "access"   "gpa"      "access"   "access"   "qlikview" "gpa"      "access"  
[10] "gpa"     

$text[[4]]
 [1] "qlikview" "gpa"      "access"   "gpa"      "access"   "access"   "qlikview" "gpa"      "access"  
[10] "gpa"     

$text[[5]]
 [1] "report"   "qlikview" "gpa"      "access"   "access"   "gpa"      "access"   "qlikview" "gpa"     
[10] "access"   "access"   "gpa"      "qlikview" "gpa"      "access"   "qlikview" "gpa"      "access"

I need to count the number of words occurring in each row of list and plot. I have tried with various ways, but effective only within sentence. Please refer this. Could somebody who has worked on such could help!

edit

dput(O)
O <- structure(list(text = list(c("report", "gpa", "access", "access", 
                                  "access", "gpa", "access", "gpa", 
                                  "access"), c("report", "report", 
                                  "access", "report", "report", "data",  
                                  "report", "report"), 
                                c("report", "qlikview", "gpa", "access", 
                                  "access", "qlikview", "gpa", "access", 
                                  "access", "qlikview", "gpa", "access", 
                                  "access", "qlikview", "gpa", "access"), 
                                  character(0),
                                c("gpa", "gpa", "gpa", "gpa", "gpa", 
                                  "gpa", "gpa", "gpa", "gpa", "gpa", 
                                  "gpa", "gpa"), 
                                c("report", "qlikview", "gpa", "access", 
                                  "access", "qlikview", "gpa", "access", 
                                  "qlikview", "gpa", "access", "access", 
                                  "gpa", "qlikview", "gpa", "access"), 
                                c("report", "data", "data"), 
                                c("report", "report", "report", "data", 
                                  "report", "report"))), .Names = "text")

Try `library(qdapTools); mtabulate(yourlist)` A `dput` output of the above example would have been easier to test. — akrun, Apr 20 '15 at 06:37
Please make this a reproducible example with an expected output. Noone wants to have to recreate your data. Also, do you mean number of unique words in each list, or number of words total? — thelatemail, Apr 20 '15 at 06:38
unique words, im sorry dput just gives as list but the above data won't be represented as it is — KRU, Apr 20 '15 at 06:49
You can dput a smaller subset of your list. i.e. type `dput(head(yourlist))` on the R console and update your post by copy/pasting the dput output ( though I don't know if this is still big enough) — akrun, Apr 20 '15 at 06:50
**You've essentially been [asking the same question multiple times for two weeks now](http://stackoverflow.com/questions/29530584/r-word-frequency#comment47226600_29530584)**. Could you at minimum explain in the title and question body how this question differs from previous askings and why the previous answers were unsuitable? It would help people if you linked to the previous askings and gave a summary why those answers didn't solve it. — smci, Apr 20 '15 at 07:24
@smci Please go through the questions properly so that type of intensity of answers may vary, i would definitely link previous questions with this , they solve for temporary type , while this gives exactly which works for all in general — KRU, Apr 20 '15 at 07:40
and not to forget i have been working on same sets of data , to identify the appropriate way to achieve the result ! — KRU, Apr 20 '15 at 07:42

akrun · Accepted Answer · 2015-04-20T07:09:10.200

2

Try

library(qdapTools)
res <- mtabulate(O$text)
dim(res)
#[1] 244   8

head(res,3)
#   access adhoc data gpa maturity pfi qlikview report
#1      4     0    0   4        0   0        2      0
#2      3     0    0   3        0   0        3      1
#3      4     0    0   4        0   0        2      0

Based on the new dput output (on a small subset)

res1 <- mtabulate(O$text)
head(res1,3)
#  access data gpa qlikview report
#1      5    0   3        0      1
#2      1    1   0        0      6
#3      7    0   4        4      1

edited Apr 20 '15 at 07:09

answered Apr 20 '15 at 07:00

akrun

874,273
37
540
662

@KRU No problem. I noticed that you changed the `dput`. Updated my post accordingly. – akrun Apr 20 '15 at 07:09

R: Count the Occurrence of words in list to create benchmark

1 Answers1