How can I turn the output of kwic into a corpus for further analysis? More specifically, I want to create a corpus based on the words coming before and after a keyword (contextPre, contextPost) to do further sentiment analysis on them.
Asked
Active
Viewed 890 times
1
-
1Just grab the contextPre, keywork, and contextPost, paste them together, and put them into a corpus, no? Please have a look at [how to make a reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), and show us what you have tried. For your case a small reproducible example might be something like `head(kwic(inaugTexts, "security", window = 3))` – Jota May 25 '16 at 19:31
-
That worked, and it was straight forward. Thank you! – DebNa May 25 '16 at 21:47
1 Answers
2
Simplest way: create a pre-context and a post-context corpus, with a document variable (docvar
) identifying the context, and then merge the two corpora with a +
operation.
require(quanteda)
mykwic <- kwic(data_corpus_inaugural, "terror")
# make a corpus with the pre-word context
mycorpus <- corpus(mykwic$pre)
docvars(mycorpus, "context") <- "pre"
# make a corpus with the post-word context
mycorpus2 <- corpus(mykwic$post)
docvars(mycorpus2, "context") <- "post"
# combine the two corpora
mycorpus <- mycorpus + mycorpus2
summary(mycorpus)
# Corpus consisting of 16 documents.
#
# Text Types Tokens Sentences context
# text1 5 5 1 pre
# text2 4 5 1 pre
# text3 5 5 1 pre
# text4 5 5 1 pre
# text5 5 5 1 pre
# text6 5 5 1 pre
# text7 5 5 1 pre
# text8 5 5 1 pre
# text11 4 5 1 post
# text21 5 5 1 post
# text31 5 5 1 post
# text41 5 5 1 post
# text51 5 5 1 post
# text61 5 5 2 post
# text71 5 5 2 post
# text81 5 5 1 post
#
# Source: Combination of corpuses mycorpus and mycorpus2
# Created: Wed May 25 23:35:54 2016
# Notes:
Added:
As of v0.9.7-6, quanteda has a method to construct a corpus
directly from a kwic
object. So this now works:
mykwic <- kwic(data_corpus_inaugural, "southern")
summary(corpus(mykwic))
# Corpus consisting of 28 documents.
#
# Text Types Tokens Sentences docname position keyword context
# text1.pre 5 5 1 1797-Adams 1807 southern pre
# text2.pre 4 5 1 1825-Adams 2434 southern pre
# text3.pre 4 5 1 1861-Lincoln 98 Southern pre
# text4.pre 5 5 1 1865-Lincoln 283 southern pre
# text5.pre 5 5 1 1877-Hayes 378 Southern pre
# text6.pre 5 5 1 1877-Hayes 956 Southern pre
# text7.pre 5 5 1 1877-Hayes 1250 Southern pre
# text8.pre 5 5 1 1881-Garfield 1007 Southern pre
# text9.pre 4 5 1 1909-Taft 4029 Southern pre
# text10.pre 5 5 1 1909-Taft 4230 Southern pre
# text11.pre 5 5 1 1909-Taft 4350 Southern pre
# text12.pre 5 5 1 1909-Taft 4537 Southern pre
# text13.pre 5 5 1 1909-Taft 4597 Southern pre
# text14.pre 5 5 1 1953-Eisenhower 1226 southern pre
# text1.post 5 5 1 1797-Adams 1807 southern post
# text2.post 5 5 1 1825-Adams 2434 southern post
# text3.post 5 5 1 1861-Lincoln 98 Southern post
# text4.post 5 5 2 1865-Lincoln 283 southern post
# text5.post 5 5 2 1877-Hayes 378 Southern post
# text6.post 5 5 1 1877-Hayes 956 Southern post
# text7.post 5 5 1 1877-Hayes 1250 Southern post
# text8.post 5 5 2 1881-Garfield 1007 Southern post
# text9.post 5 5 2 1909-Taft 4029 Southern post
# text10.post 5 5 1 1909-Taft 4230 Southern post
# text11.post 5 5 1 1909-Taft 4350 Southern post
# text12.post 5 5 1 1909-Taft 4537 Southern post
# text13.post 5 5 1 1909-Taft 4597 Southern post
# text14.post 5 5 1 1953-Eisenhower 1226 southern post
#
# Source: Corpus created from kwic(x, keywords = "southern")
# Created: Thu May 26 09:47:19 2016
# Notes:

Ken Benoit
- 14,454
- 27
- 50