2

According to the manual page of the truncate function in R, on some platforms including Windows:

... it will not work for large (> 2Gb) files

After some experimentation, I managed to make up a toy example showing that it is possible to do this for large files (quite easily) with visual c++:

// ConsoleApplication1.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <windows.h>
#include <tlhelp32.h>
#include <tchar.h>
#include <iostream>
#include <string>

//  Forward declarations:
void append(LPCTSTR, LPCVOID, DWORD);
void readTail(LPCTSTR, LPVOID, DWORD);
void truncateTail(LPCTSTR, long);


int main()
{
    LPCTSTR fn = L"C:/kaiyin/kybig.out";
    char buf[] = "helloWorld"; 
    append(fn, buf, 10);
    BYTE buf1[10] = {0};
    readTail(fn, buf1, 5);
    std::cout << (char*) buf1 << std::endl;
    //truncateTail(fn, 5);
    //for (int i = 0; i < 10; i++) {
    //  buf1[i] = 0;
    //}
    //readTail(fn, buf1, 5);
    //std::cout << (char*) buf1 << std::endl;

    printf("End of program\n");
    std::string s = "";
    std::getline(std::cin, s);
    return 0;
}

void append(LPCTSTR filename, LPCVOID buf, DWORD writeSize) {
    LARGE_INTEGER size;
    size.QuadPart = 0;
    HANDLE fh = CreateFile(filename, GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    GetFileSizeEx(fh, &size);
    SetFilePointerEx(fh, size, NULL, FILE_BEGIN);
    WriteFile(fh, buf, writeSize, NULL, NULL);
    CloseHandle(fh);
}

void readTail(LPCTSTR filename, LPVOID buf, DWORD readSize) {
    LARGE_INTEGER size;
    size.QuadPart = 0;
    HANDLE fh = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    GetFileSizeEx(fh, &size);
    size.QuadPart -= readSize;
    SetFilePointerEx(fh, size, NULL, FILE_BEGIN);
    ReadFile(fh, buf, readSize, NULL, NULL);
    CloseHandle(fh);
}

void truncateTail(LPCTSTR filename, long truncateSize) {
    LARGE_INTEGER size;
    size.QuadPart = 0;
    HANDLE fh = CreateFile(filename, GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if (fh == INVALID_HANDLE_VALUE) {
        std::cerr << GetLastError();
        return;
    }
    GetFileSizeEx(fh, &size);
    size.QuadPart -= truncateSize;
    SetFilePointerEx(fh, size, NULL, FILE_BEGIN); 
    if (SetEndOfFile(fh) == 0) {
        std::cerr << GetLastError();
        return;
    }
    CloseHandle(fh);
}

This will append "helloWorld" to the file "C:/kaiyin/kybig.out", and then truncate "World". In the console it should print "World" (tail before truncating), then "hello" (tail after truncating).

There seems to be no problem at all in truncating the tail of a file larger than 2GB -- in fact, I have tested with a 4e9 byte file and the program still behaves correctly.

Am I missing something, or is it true that the truncate function can indeed be reliably (and easily) implemented on Windows?


Update

Following @hrbrmstr's reference to this R bugzilla link, I tried some R code to verify whether the truncate function works properly on Windows 8.1:

filename = "C:/kaiyin/kybig.out"
f = file(filename, "w")
seek(f, 5L, "end")
truncate(f)
file.info(filename)$size

Results:

> filename = "C:/kaiyin/kybig.out"
> f = file(filename, "w")
> seek(f, 5L, "end")
[1] 0
> truncate(f)
NULL
> file.info(filename)$size
[1] 0

Apparently truncate just trashes everything despite the seeking to near the end.

qed
  • 22,298
  • 21
  • 125
  • 196
  • Is that not just saying if you're using an old copy of windows on a FAT16 partition, the maximum supported file size is 2GB, so that's all this will support as well? – Octopoid Sep 23 '15 at 09:57
  • @qed did you try the built-in `truncate` under Windows? this may just be outdated documentation. [here's](https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=7879) [why](https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=7880) i think that may be the case. – hrbrmstr Sep 23 '15 at 10:29
  • @hrbrmstr It seems not to be working, I will post a new question on that. – qed Sep 23 '15 at 11:13
  • @hrbrmstr: See my answer to the linked question: http://stackoverflow.com/a/32725144/946850 – krlmlr Sep 23 '15 at 11:23

2 Answers2

3

Am I missing something, or is it true that the truncate function can indeed be reliably (and easily) implemented on Windows?

The likely explanation is that the problem is nothing to do with Windows but all to do with the implementation of the R function. On Windows it likely uses a signed 32 bit integer to specify the truncated file size, hence the limitation.

It's also plausible that the documentation could be out of date, and that the R developers have now managed to work out how to implement this function correctly on Windows.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
0

R does not seem to use the Windows API to access files, instead it looks like it uses a POSIX (or POSIX-like) layer, at least according to the source code I mentioned in the linked question. So, while truncating huge files probably works on Windows using the Windows API (as shown in your code), it may well be that this POSIX(-like) layer that R uses does not (yet) fully support this (again, see source).

krlmlr
  • 25,056
  • 14
  • 120
  • 217
  • I'm not sure it's really a POSIX layer. That dresses it up somewhat. The R source is calling POSIX like functions in the Windows C runtime. Which stand on top of the Win32 layer. – David Heffernan Sep 23 '15 at 11:58
  • @DavidHeffernan: Amended, thanks. R on Windows seems to depend on MinGW, but I'm really on thin ice here. Anyway, layer is layer. – krlmlr Sep 23 '15 at 13:12