1

My plan is to work on a corpus of data in R, using the Quanteda package. I'm using the poliblogs.csv database accessible from here, with the script below. In the past, the script worked smoothly, but now an error message appears. The errors happen when I try to get the stats of the corpus or when I try to trim it.

The script I run is the following:

require("quanteda")
require("readtext")
require("topicmodels")
require("stm")
blog.dat <- readtext("poliblogs2008.csv",
        textfield="documents")

meta.list <- list(blog=blog.dat$blog,       
        day=blog.dat$day,
        rating= blog.dat$rating)
names(meta.list)
blogcorpus <- corpus(blog.dat,
            meta=meta.list)
meta <- meta(blogcorpus)
mycorpus.stats <- summary(blogcorpus)
blog.dfm <- dfm(blogcorpus, remove=stopwords("english"),
                stem= TRUE,
                removePunct= TRUE)

The errors appear when I run the lines:

mycorpus.stats <- summary(blogcorpus)

and

blog.dfm <- dfm(blogcorpus, remove=stopwords("english"),
                stem= TRUE,
                removePunct= TRUE)

In both cases, the following same error message appears:

Error in if (...length() && any(...names() == "Dimnames")) .Object@Dimnames <- fixupDN(.Object@Dimnames) : 
  missing value where TRUE/FALSE needed

The same error appears with a multiplicity of datasets, so it doesn't seem to be data-dependent.

UPDATE: Actually, the problem was caused by a bug that occurred to RStudio itself. Uninstalling and re-installing R and RStudio solved the issue and made the error disappear. Thanks to everyone who looked and provided a solution.

A.G.
  • 11
  • 3
  • 2
    FYI, you are using `require` incorrectly, see https://stackoverflow.com/a/51263513/3358272, https://yihui.org/en/2014/07/library-vs-require/, https://r-pkgs.org/namespace.html#search-path. For what you're doing, use `library`, not `require`. – r2evans Jan 30 '23 at 02:45
  • Thank you for your clarification on the point, that I incorporated. Unfortunately, this doesn't change the error in discussion. – A.G. Jan 30 '23 at 08:40

2 Answers2

1

require(quanteda) is fine, but you should update the packages.

> blog.dfm <- dfm(blogcorpus, remove=stopwords("english"),
+                 stem= TRUE,
+                 removePunct= TRUE)
Warning messages:
1: 'dfm.corpus()' is deprecated. Use 'tokens()' first. 
2: removePunct argument is not used. 
3: removePunct argument is not used. 
4: 'remove' is deprecated; use dfm_remove() instead 
5: 'stem' is deprecated; use dfm_wordstem() instead 

It should be

> blog.toks <- tokens(blogcorpus, remove_punct = TRUE) %>% 
+     tokens_remove(stopwords("en")) %>% 
+     tokens_wordstem()
> blog.dfm <- dfm(blog.toks)
> blog.dfm
Document-feature matrix of: 13,246 documents, 102,320 features (99.83% sparse) and 5 docvars.
                     features
docs                  week fals statement lie dismiss apolog pakistani presid pervez musharraf
  poliblogs2008.csv.1    2    1         2   1       1      1         1      1      1         7
  poliblogs2008.csv.2    1    0         0   0       0      0         0      0      0         0
  poliblogs2008.csv.3    0    0         0   0       0      0         0      0      0         0
  poliblogs2008.csv.4    0    0         0   0       0      0         0      0      0         0
  poliblogs2008.csv.5    0    0         0   1       0      0         0      0      0         0
  poliblogs2008.csv.6    0    0         0   0       0      0         0      2      0         0
[ reached max_ndoc ... 13,240 more documents, reached max_nfeat ... 102,310 more features ]
Kohei Watanabe
  • 750
  • 3
  • 6
  • 1
    `require` might *work* but, as explained in the posted links, it’s *not* “perfectly fine”. – Konrad Rudolph Jan 30 '23 at 22:12
  • Thank you a lot for the update to the package. Actually, in the end I discovered that the error was caused by a strange bug in RStudio. Uninstalling and reinstalling R and RStudio solved the issue. – A.G. Jan 31 '23 at 23:37
1

I had the same error. Uninstalling and reinstalling R fixed it for me as well. Un/Reinstalling RStudio was not necessary.