I wrote a small R script. Input are text files (thousands of journal articles). I generated the metadata (including the publication year) from the file names. Now I want to calculate the total number of tokens per year. However, I am not getting anywhere here.
# Metadata from filenames
rawdata_SPARA <- readtext("SPARA_paragraphs/*.txt", docvarsfrom = "filenames", dvsep="_",
docvarnames = c("Unit", "Year", "Volume", "Issue"))
# we add some more metadata columns to the data frame
rawdata_SPARA$Year <- substr(rawdata_SPARA$Year, 0, 4)
# Corpus
SPARA_corp <- corpus(rawdata_SPARA)
Does anyone here know a solution?
I used tokens_by function of the quanteda package which seems to be outdated.