Does anyone know how to change the language code textcat gives as output. My real world output looks something like this:
Reproducible example:
library(textcat)
library(tidyverse)
df <- data.frame(
text = c("Das ist deutsch","This is english", "C'est francais")
)
df <- df |>
mutate(
lang_textcat = textcat(text)
)
## iso639-1 code
# german == de
# english == en
# french == fr
df <- df |>
mutate(
lang_iso = c("de","en","fr")
)
What I get from textcat you see in column lang_textcat. But what I want is the output like in column lang_iso. Is there an option to change the output to ISO 639-1? I could manually recode it, but it would be great, if there is an built-in option.
textcat package: https://cran.r-project.org/web/packages/textcat/textcat.pdf
Thanks!