1

I am developing a routine to automatically define several corpora in quanteda. I have several parameters controlling the script and one of them is the name of the corpus that will be generated. I can easily create a corpus programmatically with the function assign() but I completely fail to add any docvars to it.

Once I define the corpus, I usually invoke it throughout the code with the function get(). I have been using this approach quite extensively with success. For some reason, the function docvars() does not seem to accept an object invoked with get().

Please, have a look at the simple code below where I define the corpus and then try to associate a docvar to it.

library(quanteda)
#> Package version: 2.1.2
#> Parallel computing: 2 of 16 threads used.
#> See https://quanteda.io for tutorials and examples.
#> 
#> Attaching package: 'quanteda'
#> The following object is masked from 'package:utils':
#> 
#>     View

nameofthecorpus = "mycorpus"
mytext <- c( "This is a long speech made in 2020",
             "This is another long speech made in 2021")
thedocvars <- c( "2020", "2021" )

assign( nameofthecorpus, corpus( mytext ) )

# I can have a summary of the corpus with get()
summary( get( nameofthecorpus )  )
#> Corpus consisting of 2 documents, showing 2 documents:
#> 
#>   Text Types Tokens Sentences
#>  text1     8      8         1
#>  text2     8      8         1

# Now I wand to add some docvars automatically
# This is where I get stuck
docvars( get( nameofthecorpus ), "year" ) <- thedocvars 
#> Error in docvars(get(nameofthecorpus), "year") <- thedocvars: could not find function "get<-"

Created on 2021-02-17 by the reprex package (v1.0.0)

In principle, I would like to generalize this to multiple docvars at once (e.g., like when they are stored in a data.frame).

Any suggestion?

Francesco Grossetti
  • 1,555
  • 9
  • 17

1 Answers1

3

First I would very strongly suggest you avoid get and assign when possible for manipulating variables like this. It's a very indirect approach and as you've already seen easily breaks when trying to use these indirect values to update values. When you run something like

docvars( mycorpus, "year" ) <- thedocvars 

You are running a special function called docvars<- that returns a new object that will replace the value stored in mycorpus. When you put get( nameofthecorpus ), that's not a variable value that can be replaced, that's a function call that returns a value. So if you need to use get/assign, you'd have to do something like this

assign(nameofthecorpus, `docvars<-`(get( nameofthecorpus ), "year", thedocvars))

Were you retrieve the value from the name, explicitly call the transformational version of the docvars function to get an updated object value, and then reassign that value to the original variable name.

A better approach to get/assign is usually a named list. Something like

nameofthecorpus = "mycorpus"
mytext <- c( "This is a long speech made in 2020",
             "This is another long speech made in 2021")
thedocvars <- c( "2020", "2021" )

mydata <- list()
mydata[[nameofthecorpus]] <- corpus( mytext )
summary( mydata[[nameofthecorpus]]  )
docvars( mydata[[nameofthecorpus]], "year" ) <- thedocvars 
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you! So there is no direct way to programmatically create an object and retain the correct class. I mean, I'd like a `corpus` object at the end of the script. – Francesco Grossetti Feb 18 '21 at 06:57
  • One additional point. doing `save( mydata[[ nameofthecorpus ]], file = "thepathtofile.RData" )` does not work with `object "mydata[[ nameofthecorpus ]]" not found`. How do I save an object defined in a list? – Francesco Grossetti Feb 18 '21 at 07:22
  • It’s not clear to me what you are asking. I’m unclear how you are consuming this “script”. Perhaps this could all be avoided by just writing proper functions that return the final desired object rather than injecting it into an environment as a side effect. – MrFlick Feb 18 '21 at 07:22
  • If you are trying to save a single object, `saveRDS` would be a better choice. Then you don’t have to hard code a variable name at all. – MrFlick Feb 18 '21 at 07:24
  • I am using standard functions defined in **quanteda**. The only difference is that I want to be able to pass names dynamically. What you suggested worked out great. I can access the object of the wanted class no problem except when I want to save them (pls see comment above). All in all, I don't want to interact with the script. This is launched via `Rscript` or via Jobs in RStudio. Thank you – Francesco Grossetti Feb 18 '21 at 07:25
  • For the record, using `saveRDS()` works. I will change the way I load in data accordingly. Thanks! – Francesco Grossetti Feb 18 '21 at 07:33