7

This question, Write lines of text to a file in R, shows three different for saving outputs to a plain text file. Using the example from the question, let's say that we want to create a file named output.txt with this text:

Hello
World

The question's answers show three methods:

  1. Using writeLines():
fileConn<-file("output.txt")
writeLines(c("Hello","World"), fileConn)
close(fileConn)
  1. Using sink():
sink("outfile.txt")
cat("hello")
cat("\n")
cat("world")
sink()
  1. Using cat():
cat("Hello",file="outfile.txt",sep="\n")
cat("World",file="outfile.txt",append=TRUE)

Some of the answers and comments note that cat() would be slower than the other two methods. However, my questions are:

  1. Are there situations when one method is better than the others?
  2. If one method is more idiomatically correct or quicker than the other two methods in R, why?

I searched SO and only found the linked answer. I have found other why question on SO (e.g., Why is processing a sorted array faster than processing an unsorted array?) so I think this question is on topic for the site.

Richard Erickson
  • 2,568
  • 8
  • 26
  • 39
  • 3
    This is an interesting question, but think it could go off in many tangents. Do you have any more info about your intended use-case or some direction for answering the questions? For instance, `sink()` is inherently different in my opinion, because it diverts output from the console. `writeLines()` seems like the best option for dealing with a lot of text... and so on. – Matt Jun 02 '22 at 18:19
  • 1
    Not an answer, but `capture.output(cat("Hello\nWorld\n"), file="outfile.txt")` is a fourth option. – dcarlson Jun 02 '22 at 18:22
  • @Matt `because it diverts output from the console` is part of an answer I'm looking for. I could not find any broad R documentation about why to use one function over another. – Richard Erickson Jun 02 '22 at 18:29
  • 1
    As per [documentation](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/cat), `cat()` accepts an arbitrary number of arguments, converts them to character vectors, concatenates them and appends the given separator to each element. `writeLines()` accepts a single character vector and writes it to a connection. The docs suggest that `cat()` is useful for producing output in user-defined functions. – MorganK95 Jun 07 '22 at 01:14

3 Answers3

4

A short performance comparison give clear advantage to writeLines:

system.time(writeLines(con = 'writeLines.txt',text = paste('Line ',1:100000)))
#>        user      system       total 
#>        0.17        0.01        0.18

system.time(cat( paste('Line ',1:100000),file="cat.txt",sep="\n"))
#>        user      system       total 
#>        0.25        0.85        1.11

Looking at C code, cat uses RPrintf located in builtin.c.

All printing in R is done via the functions Rprintf and REprintf or their (v) versions Rvprintf and REvprintf.
These routines work exactly like (v)printf(3). Rprintf writes to ``standard output''.
It is redirected by the sink() function

As stated above, using sink() allows to redirect the connection. Redirecting once to a file using sink() is definitely faster than opening a new connection / appending data / closing for each new line as with cat(file=..., append=TRUE).

writeLines uses a dedicated C function Rconn_printf located in connections.c and is quicker.

To sum up :

  • cat is the standard R output to console
  • sink allows to redirect cat output to another connection, for example a file, allowing to write multiple lines without reopening the connection
  • writeLines is quicker than sink+cat for file output

Suggestion, when to use which:

  • I like cat() to log R script progress in console mode. Once the script is validated, sink() allows to redirect this output to a file if needed and might be useful for script automation.
  • I use writeLines() when I specifically want to write data (not just log) to a file because of it's better performance.
Richard Erickson
  • 2,568
  • 8
  • 26
  • 39
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Thanks for your answer. Is either route more idiomatic for R for simple applications? Also, thanks for explaining the why. – Richard Erickson Jun 10 '22 at 20:03
  • I like `cat` to log R script progress in console mode. Once the script is validated, `sink` allows to redirect this to a file if needed : might be useful for script automation. I use `writeLines` when I specifically want to write data (not just log) to a file because of it's better performance. – Waldi Jun 10 '22 at 20:20
  • 1
    Thank you for the guidance about when to use which. I guess that's what I was originally looking for, even if my question was not clear. I edited your guidance into the question. Please edit if needed. Also, I gave you the bounty because of your guidance. – Richard Erickson Jun 10 '22 at 21:26
  • @RichardErickson The output of **both**, `cat` and `writeLines` go per default to the console and for **both** their output can be redirected to a file using `sink`. So in this point there is no difference. The difference is, that `cat` can take *R objects* which need to be converted to character, what needs time and so `cat` is slower than `writeLines` as `writeLines` takes only a *character vector* which can be send out as it is. – GKi Jun 11 '22 at 06:47
4

cat uses R objects and writeLines one character vector.
cat converts numeric/complex elements in the same way as print and not in the way as as.character, so options digits and scipen are relevant.

options(digits = 3)
x <- 0.123456789

cat("Result:", x)
#Result: 0.123

writeLines(paste("Result:", x))
#Result: 0.123456789

writeLines(paste("Result:", format(x)))
#Result: 0.123

writeLines(c("Result:", format(x)), sep=" ")
#Result: 0.123

Using R objects and converting them will take time.
In case of having a character vector writeLines will be as convenient as cat but more efficient.
In case having different objects cat will be more convenient but slower.

GKi
  • 37,245
  • 2
  • 26
  • 48
  • Thanks for your answer. Is either route more idiomatic for R for simple applications? – Richard Erickson Jun 10 '22 at 20:02
  • 1
    If there is only a character vector and speed matters use `writeLines`. If there are different types `cat` would be easier from my point of view but will be slower. – GKi Jun 10 '22 at 20:08
  • Would mind editing that into the answer? under a # when to use-type header? – Richard Erickson Jun 10 '22 at 20:11
  • I have thought this is already with other words in the answer with: *In case of having a character vector `writeLines` will be as convenient as `cat` but more efficient. In case having different objects `cat` will be more convenient but slower.** – GKi Jun 10 '22 at 20:13
1

Regarding performance in R it is almost always worth looking into library(data.table).

Here is a benchmark taking into account data.table::fwrite:

library(data.table)
library(microbenchmark)

microbenchmark(
  writeLines = {
    writeLines(con = 'writeLines.txt', text = paste('Line', 1:100000))
  },
  cat = {
    cat(paste('Line', 1:100000), file = "cat.txt", sep = "\n")
  },
  fwrite = {
    fwrite(as.list(paste('Line', 1:100000)), file = "fwrite.txt", sep = "\n")
  },
  times = 1L
)

Unit: milliseconds
       expr       min        lq      mean    median        uq       max neval
 writeLines  202.5904  202.5904  202.5904  202.5904  202.5904  202.5904     1
        cat 1234.6644 1234.6644 1234.6644 1234.6644 1234.6644 1234.6644     1
     fwrite  106.8576  106.8576  106.8576  106.8576  106.8576  106.8576     1
ismirsehregal
  • 30,045
  • 5
  • 31
  • 78
  • Thank you for the benchmark. Does data.table have a way to write plain text, like "hello world! This is the output from an print(lm) * 1000 " in a non-CSV or tidydata style file? – Richard Erickson Jun 09 '22 at 15:17
  • Not sure if I'm getting the question correctly but as long as you wrap it in a list it should be working: `fwrite(list("hello world! This is the output from an print(lm) * 1000 "), file = "test.txt", sep = "\n")` – ismirsehregal Jun 09 '22 at 15:20
  • with sink("text.txt") one can dump outputs to a text file such as `print(lm(...))`. Would that work with `fwrite()` or would the data need to formatted? The fwrite docs says it is for CSV-type data https://www.rdocumentation.org/packages/data.table/versions/1.14.2/topics/fwrite – Richard Erickson Jun 09 '22 at 15:29
  • 1
    Yes - `fwrite` is intended to write csv files but it can be used to write "lines" as well (or a single column). `fwrite` expects a list of same length vectors as its input - accordingly (afaik) it can't directly work on R output. – ismirsehregal Jun 09 '22 at 19:00