250

How do you import a plain text file as single character string in R? I think that this will probably have a very simple answer but when I tried this today I found that I couldn't find a function to do this.

For example, suppose I have a file foo.txt with something I want to textmine.

I tried it with:

scan("foo.txt", what="character", sep=NULL)

but this still returned a vector. I got it working somewhat with:

paste(scan("foo.txt", what="character", sep=" "),collapse=" ")

but that is quite an ugly solution which is probably unstable too.

Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131

8 Answers8

245

Here's a variant of the solution from @JoshuaUlrich that uses the correct size instead of a hard-coded size:

fileName <- 'foo.txt'
readChar(fileName, file.info(fileName)$size)

Note that readChar allocates space for the number of bytes you specify, so readChar(fileName, .Machine$integer.max) does not work well...

Tommy
  • 39,997
  • 12
  • 90
  • 85
  • 22
    It is worth pointing out that this code won't work for compressed files. In that case, the number of bytes returned by file.info(filename)$size will not match the actual content that will be read in memory, which we expect to be larger. – asieira Mar 17 '14 at 18:08
183

In case anyone is still looking at this question 3 years later, Hadley Wickham's readr package has a handy read_file() function that will do this for you.

# you only need to do this one time on your system
install.packages("readr")
library(readr)
mystring <- read_file("path/to/myfile.txt")
Abel Callejo
  • 13,779
  • 10
  • 69
  • 84
Sharon
  • 3,676
  • 3
  • 23
  • 20
61

I would use the following. It should work just fine, and doesn't seem ugly, at least to me:

singleString <- paste(readLines("foo.txt"), collapse=" ")
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 19
    I would have expected `collapse="\n"` to replicate the fact that these are separate lines on the original file. With this change, this solution *will* work for compressed and uncompressed files equally well. – asieira Mar 17 '14 at 18:09
  • This doesn't seem to work. If I writeLines(singleString), I get a corrupted file... – bumpkin Oct 28 '14 at 18:13
  • This does not work if the last line does not include an end of line character. In that case, the last line is not included in the string (alternatively, file is truncated at the last line break). – W7GVR Mar 06 '18 at 14:49
  • This will work fine for reading text files as in the OP's queston: Text file connections are `blocking=TRUE` by default so `readLines()` will return the full file just with a warning about the missing EOL character. However @gvrocha's comment is worth heeding: understand your connection type! ?readLines help says `If the final line is incomplete (no final EOL marker) the behaviour depends on whether the connection is blocking or not. For a non-blocking text-mode connection the incomplete line is pushed back, silently. **For all other connections the line will be accepted, with a warning.**` – krads Apr 11 '19 at 23:52
17

How about:

string <- readChar("foo.txt",nchars=1e6)
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
9

The readr package has a function to do everything for you.

install.packages("readr") # you only need to do this one time on your system
library(readr)
mystring <- read_file("path/to/myfile.txt")

This replaces the version in the package stringr.

Mike Stanley
  • 1,420
  • 11
  • 13
8

Too bad that Sharon's solution cannot be used anymore. I've added Josh O'Brien's solution with asieira's modification to my .Rprofile file:

read.text = function(pathname)
{
    return (paste(readLines(pathname), collapse="\n"))
}

and use it like this: txt = read.text('path/to/my/file.txt'). I couldn't replicate bumpkin's (28 oct. 14) finding, and writeLines(txt) showed the contents of file.txt. Also, after write(txt, '/tmp/out') the command diff /tmp/out path/to/my/file.txt reported no differences.

2

readChar doesn't have much flexibility so I combined your solutions (readLines and paste).

I have also added a space between each line:

con <- file("/Users/YourtextFile.txt", "r", blocking = FALSE)
singleString <- readLines(con) # empty
singleString <- paste(singleString, sep = " ", collapse = " ")
close(con)
harris11
  • 133
  • 1
  • 7
-1

It seems your solution is not much ugly. You can use functions and make it proffesional like these ways

  • first way
new.function <- function(filename){
  readChar(filename, file.info(filename)$size)
}

new.function('foo.txt')
  • second way
new.function <- function(){
  filename <- 'foo.txt'
  return (readChar(filename, file.info(filename)$size))
}

new.function()
Kalana
  • 5,631
  • 7
  • 30
  • 51
  • 5
    This doesn't add anything to the answer provided by [@Tommy](https://stackoverflow.com/users/662787/tommy). Providing path within a function environment is particularly poor solution. – Konrad Nov 19 '19 at 16:29