How do I transform a kwic from quanteda package to a corpus?

Question

How can I turn the output of kwic into a corpus for further analysis? More specifically, I want to create a corpus based on the words coming before and after a keyword (contextPre, contextPost) to do further sentiment analysis on them.

Just grab the contextPre, keywork, and contextPost, paste them together, and put them into a corpus, no? Please have a look at [how to make a reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), and show us what you have tried. For your case a small reproducible example might be something like `head(kwic(inaugTexts, "security", window = 3))` — Jota, May 25 '16 at 19:31

Ken Benoit · Answer 1 · 2018-01-28T15:56:21.790

Simplest way: create a pre-context and a post-context corpus, with a document variable (docvar) identifying the context, and then merge the two corpora with a + operation.

require(quanteda)
mykwic <- kwic(data_corpus_inaugural, "terror")

# make a corpus with the pre-word context
mycorpus <- corpus(mykwic$pre)
docvars(mycorpus, "context") <- "pre"

# make a corpus with the post-word context
mycorpus2 <- corpus(mykwic$post)
docvars(mycorpus2, "context") <- "post"

# combine the two corpora
mycorpus <- mycorpus + mycorpus2

summary(mycorpus)
# Corpus consisting of 16 documents.
# 
#  Text Types Tokens Sentences context
# text1     5      5         1     pre
# text2     4      5         1     pre
# text3     5      5         1     pre
# text4     5      5         1     pre
# text5     5      5         1     pre
# text6     5      5         1     pre
# text7     5      5         1     pre
# text8     5      5         1     pre
# text11     4      5         1    post
# text21     5      5         1    post
# text31     5      5         1    post
# text41     5      5         1    post
# text51     5      5         1    post
# text61     5      5         2    post
# text71     5      5         2    post
# text81     5      5         1    post
# 
# Source:  Combination of corpuses mycorpus and mycorpus2
# Created: Wed May 25 23:35:54 2016
# Notes:

Added:

As of v0.9.7-6, quanteda has a method to construct a corpus directly from a kwic object. So this now works:

mykwic <- kwic(data_corpus_inaugural, "southern")
summary(corpus(mykwic))
# Corpus consisting of 28 documents.
# 
#      Text Types Tokens Sentences         docname position  keyword context
# text1.pre     5      5         1      1797-Adams     1807 southern     pre
# text2.pre     4      5         1      1825-Adams     2434 southern     pre
# text3.pre     4      5         1    1861-Lincoln       98 Southern     pre
# text4.pre     5      5         1    1865-Lincoln      283 southern     pre
# text5.pre     5      5         1      1877-Hayes      378 Southern     pre
# text6.pre     5      5         1      1877-Hayes      956 Southern     pre
# text7.pre     5      5         1      1877-Hayes     1250 Southern     pre
# text8.pre     5      5         1   1881-Garfield     1007 Southern     pre
# text9.pre     4      5         1       1909-Taft     4029 Southern     pre
# text10.pre     5      5         1       1909-Taft     4230 Southern     pre
# text11.pre     5      5         1       1909-Taft     4350 Southern     pre
# text12.pre     5      5         1       1909-Taft     4537 Southern     pre
# text13.pre     5      5         1       1909-Taft     4597 Southern     pre
# text14.pre     5      5         1 1953-Eisenhower     1226 southern     pre
# text1.post     5      5         1      1797-Adams     1807 southern    post
# text2.post     5      5         1      1825-Adams     2434 southern    post
# text3.post     5      5         1    1861-Lincoln       98 Southern    post
# text4.post     5      5         2    1865-Lincoln      283 southern    post
# text5.post     5      5         2      1877-Hayes      378 Southern    post
# text6.post     5      5         1      1877-Hayes      956 Southern    post
# text7.post     5      5         1      1877-Hayes     1250 Southern    post
# text8.post     5      5         2   1881-Garfield     1007 Southern    post
# text9.post     5      5         2       1909-Taft     4029 Southern    post
# text10.post     5      5         1       1909-Taft     4230 Southern    post
# text11.post     5      5         1       1909-Taft     4350 Southern    post
# text12.post     5      5         1       1909-Taft     4537 Southern    post
# text13.post     5      5         1       1909-Taft     4597 Southern    post
# text14.post     5      5         1 1953-Eisenhower     1226 southern    post
# 
# Source:  Corpus created from kwic(x, keywords = "southern")
# Created: Thu May 26 09:47:19 2016
# Notes:

How do I transform a kwic from quanteda package to a corpus?

1 Answers1