0

I am trying to use readtext in R to import over 13,000 .rtf files but received an error message below.

uk <- readtext("/Users/path/*.rtf",
docvarsfrom = "filenames",
docvarnames = c("country", "year", "id"),
dvsep = "_")
Error in chartr(.cptable[[cpname]]$before, .cptable[[cpname]]$after, out[parsed$toconv]) : 
  invalid input '' in 'utf8towcs'

When I applied the same code to a test folder containing only 1,000 files, the code seemed to work fine. However, when I tried to increase the number of files in the folder to 5,000, the same error code returned. The filenames that I'm trying to import are formatted as uk_1992_1.rtf or uk_2010_3568.rtf, as shown in the link below.

filename (1,000)

My questions are:

  1. Is this just a matter of trying to import too many files at once?

  2. Is there a way to fix this code to allow more files to be imported at once?

  3. Is there a workaround if there is no way to fix the code?

Apologies if the question has been asked elsewhere, I have tried to look for a similar question but did not find any. I can (and have tried to) split the files into several smaller folders, which seems to work fine, but there are more countries with the same number of files that will need to be processed and analysed the same way. TIA!

  • 1
    My guess is there is 1 or more files causing an error when read in. The solution would be either using an iterative approach that is resilient (like a for loop with error catching or a `map` call using `safely`) or identifying the problematic file and removing it. One stupid but effective to do the latter is give it different ranges of files to cover and slowly narrow down the one causing the problem. – geoff Jan 12 '22 at 19:28
  • Seems like a character encoding issue to me. Try to use different `encoding` arguments. The following argument may be used if you are on Windows `encoding = 'WINDOWS-1252'` – hyena Jan 12 '22 at 19:53
  • Thank you both for your help. You are both correct in that one file was driving the error and that it was an encoding issue. I managed to track down the file and remove the special character manually instead and now readtext works perfectly. It's not an ideal (as in non-code) solution, but thanks again for pointing me in the right direction. – Ellie Ana Jan 14 '22 at 18:49

0 Answers0