how can I convert a corpus into a data frame in R which contains also meta data? I already tried the suggestion from convert corpus into data.frame in R, but the resulting data frame only contains the text lines from all docs in the corpus.
I need also the document ID and maybe the line number of the text line in two columns.
So, how can I extend this command: dataframe <- data.frame(text=unlist(sapply(mycorpus,
[, "content")), stringsAsFactors=FALSE)
to get the data?
I already tried
dataframe <-
data.frame(id=sapply(corpus, meta(corpus, "id")),
text=unlist(sapply(corpus, `[`, "content")),
stringsAsFactors=F)
but it didn't help; I only got an error message "Error in match.fun(FUN) : 'meta(corpus, "id")' ist nicht Funktion, Zeichen oder Symbol"
The corpus is extracted from plain text files; here is an example:
> str(corpus)
[...]
$ 1178531510 :List of 2
..$ content: chr [1:67] " uberrasch sagt [...] gemacht echt schad verursacht" ...
..$ meta :List of 7
.. ..$ author : chr(0)
.. ..$ datetimestamp: POSIXlt[1:1], format: "2015-08-16 14:44:11"
.. ..$ description : chr(0)
.. ..$ heading : chr(0)
.. ..$ id : chr "1178531510" # <--- This is the ID i want in the data.frame
.. ..$ language : chr "de"
.. ..$ origin : chr(0)
.. ..- attr(*, "class")= chr "TextDocumentMeta"
..- attr(*, "class")= chr [1:2] "PlainTextDocument" "TextDocument"
[...]
Many thanks in advance :)