I am reading a text file with readtext().
It seems to be encoded in utf-8 (according to notepad++, am unable to verify);
I am not sure if it is encoded correctly or if there are some mistakes/corruption.
File size on disk according to windows explorer is 200+ Mb.
When I read it and check its size in RAM
format(object.size(my_rt), units = "MiB"))
I get
[1] 15 MiB # I manually removed some irrilevant info
readtext() does not give any error or warning when reading it with
my_rt <- readtext(nomeFile, docvarsfrom = "filenames"
,docvarnames = c("lng","country","type","b","c","d")
,dvsep = "[_.]", encoding = "UTF-8"
, verbosity = 3)
I am practically sure that the whole file is not read entirely because a slightly bigger file occupies in RAM 198.2 Mib and a smaller file occupies 157 MiB.
Is there a way to understand what is going wrong with readtext and where?
Should I report this as an issue for readtext despite having no understanding of what the problem is?