5

A passage in the file documentation caught my eye:

## We can do the same thing with an anonymous file.
Tfile <- file()
cat("abc\ndef\n", file = Tfile)
readLines(Tfile)
close(Tfile)

What exactly is this anonymous file? Does it exist on disk, or only in memory? I'm interested in this as I'm contemplating a program that will potentially need to create/delete thousands of temporary files, and if this happens only in memory it seems like it would have a much lesser impact on system resources.

This linux SO Q appears to suggest this file could be a real disk file, but I'm not sure how relevant to this particular example that is. Additionally, this big memory doc seems to hint at a real disk based storage (though I'm assuming the file based anonymous file is being used):

It should also be noted that a user can create an “anonymous” file-backed big.matrix by specifying "" as the filebacking argument. In this case, the backing resides in the temporary directory and a descriptor file is not created. These should be used with caution since even anonymous backings use disk space which could eventually fill the hard drive. Anonymous backings are removed either manually, by a user, or automatically, when the operating system deems it appropriate.

Alternatively, if textConnection is appropriate for use for this type of application (opened/closed hundreds/thousands of times) and is memory only that would satisfy my needs. I was planning on doing this until I read the note in that function's documentation:

As output text connections keep the character vector up to date line-by-line, they are relatively expensive to use, and it is often better to use an anonymous file() connection to collect output.

Community
  • 1
  • 1
BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • you could try writing tons of data to it and monitor your resources (DISK vs RAM) to find out... – flodel Feb 07 '14 at 01:25
  • @flodel, if I can't find a reference I'll try that, though I'd be concerned about potential for different implementations on different OSes. Ideally I'd prefer to rely on documented behavior. – BrodieG Feb 07 '14 at 01:28
  • An Anonymous file is just a temporary file without a descriptor( sort of header). So I think it depends in your OS you should carefully read what is `temporary file` and how/where it is stored. – agstudy Feb 07 '14 at 01:28
  • @agstudy, not what I was hoping to hear... This is for a package so OS independence would be ideal. Maybe that's wishful thinking. – BrodieG Feb 07 '14 at 01:30
  • On Linux, the contents of a file exist on disk until the last reference is deleted. A common trick is to open a file with a name given by `tmpnam` and then delete it while it is still open. The data exists on disk, but after the deletion it is only available to programs that had it open when it was deleted (almost certainly only the creating application will have had it open). If you're able to quickly make a hard link to the file between creation and deletion, it will then have a directory entry and will continue to live after the program closes it. – Matthew Lundberg Feb 07 '14 at 01:44

2 Answers2

6

My C is very rusty, so hopefully more experienced people can correct me, but I think the answer to your question "What exactly is this anonymous file? Does it exist on disk, or only in memory?" is "It exists on disk".

Here is what happens at C level (I'm looking at the source code at http://cran.r-project.org/src/base/R-3/R-3.0.2.tar.gz):

A. Function file_open, defined in src/main/connections.c:554, has the following logic related to anonymous file (with an empty description), lines 565-568:

if(strlen(con->description) == 0) {
    temp = TRUE;
    name = R_tmpnam("Rf", R_TempDir);
} else name = R_ExpandFileName(con->description);

So a new temporary filename is generated if no file name was supplied to file.

B. If the name of the file is not equal to stdin, the call R_fopen(name, con->mode) happens at line 585 (there some subtleties with Win32 and UTF8 names, but we can ignore them now).

C. Finally, the file name is unlinked at line 607. The documentation for unlink says:

The unlink() function removes the link named by path from its directory and decrements the link count of the file which was referenced by the link. If that decrement reduces the link count of the file to zero, and no process has the file open, then all resources associated with the file are reclaimed. If one or more process have the file open when the last link is removed, the link is removed, but the removal of the file is delayed until all references to it have been closed.

So in effect the directory entry is removed but file exists as long as it's being open by R process.

D. Finally, R_fopen is defined in src/main/sysutils.c:135 and just calls fopen internally.

Victor K.
  • 4,054
  • 3
  • 25
  • 38
  • This seems to line up with what Matthew mentioned in the comments. Thanks for doing the legwork. – BrodieG Feb 07 '14 at 02:07
  • 3
    As for whether it's in memory or on disk, that depends on where your temporary files live. Many linux systems put `/tmp` in memory. Windows doesn't. – Matthew Lundberg Feb 07 '14 at 03:49
  • @MatthewLundberg And there's no guarantee that in memory will be faster in R - `textConnection()` are notoriously slow – hadley Feb 07 '14 at 13:44
0

The file behaves like a regular file. However, unlike a regular file, it lives in RAM.

Zbigniew Mazur
  • 653
  • 7
  • 11