The pattern argument won't work if the encoding is wrong. Use list.files()
without pattern=...
and you can at least get character strings from the mis-encoded filenames that you can then work with and possibly fix in R.
This is a minimal demonstating exmaple (needs the convmv
system command to set up the test case)
dir.create( wd <- tempfile() )
setwd(wd)
convmv <- Sys.which("convmv")
if( convmv == "" )
stop("Need the convmv available to continue")
f1 <- "æøå.txt"
cat( "foo\n", file=f1 )
system2( convmv, args=c("-f", "utf8", "-t", "latin1", "--notest", f1) )
f2 <- "ÆØÅ.txt"
cat( "bar\n", file=f2 )
plain.list.files <- list.files()
stopifnot( length( plain.list.files ) == 2 )
with.pattern.list.files <- list.files( pattern="\\.txt" )
stopifnot( length( with.pattern.list.files ) == 1 )
Fixing the character set can be done, but I'm not sure if you're asking about that at this point.
EDIT: Actually working with or fixing these filenames:
Now that you can read the files, how bad they may be, if you know they are latin1
for example, the following might be of help. Ironically detect_str_enc doesn't get it right (and I found no good alternative), but if you know that any filename that isn't ASCII or UTF-8, will be latin1, then this might be a working fix for you:
library(uchardet)
hard.coded.encoding <- "latin1"
nice.filenames <- sapply( plain.list.files, function(fname) {
if( !detect_str_enc(fname) %in% c("ASCII","UTF-8") ) {
Encoding(fname) <- hard.coded.encoding
}
return( fname )
})
## Now its presumably safe to look for our pattern:
i.txt <- grepl( "\\.txt$", nice.filenames )
## And we can now work with the files and present them nicely:
file.data <- lapply( plain.list.files[i.txt], function(fname) {
## Do what you want to do with the file here:
readLines( fname )
})
names(file.data) <- nice.filenames[i.txt]