1

I'm thinking of porting some of my cross-platform scripts to node.js partly to learn node.js, partly because I'm more familiar with JavaScript these days, and partly due to problems with large file support in other scripting languages.

Some scripting languages seem to have patchy support for large file offsets, depending on such things as whether they are running on a 32-/64-bit OS or processor, or need to be specifically compiled with certain flags.

So I want to experiment with node.js anyway but Googling I'm not finding much either way on its support (or it's library/framework support etc) for large files with 64-bit offsets.

I realize that to some extend this will depend on JavaScript's underlying integer support at least. If I correctly read What is JavaScript's Max Int? What's the highest Integer value a Number can go to without losing precision? it seems that JavaScript uses floating point internally even for integers and therefore

the largest exact integral value is 253

Then again node.js is intended for servers and servers should expect large file support.

Does node.js support 64-bit file offsets?


UPDATE

Despite the _LARGEFILE_SOURCE and _FILE_OFFSET_BITS build flags, now that I've started porting my project that requires this, I've found that fs.read(files.d.fd, chunk, 0, 1023, 0x7fffffff, function (err, bytesRead, data) succeeds but 0x80000000 fails with EINVAL. This is with version v0.6.11 running on 32-bit Windows 7.

So far I'm not sure whether this is a limitation only in fs, a bug in node.js, or a problem only on Windows builds.

Is it intended that greater-than-31-bit file offsets work in node.js in all core modules on all platforms?

Community
  • 1
  • 1
hippietrail
  • 15,848
  • 18
  • 99
  • 158
  • Can you give a code example of something that you're trying to achieve that may break? – jcolebrand Dec 01 '12 at 00:55
  • Well I don't know node's io libs yet but pseudocode would be something like this: `offset = getOffsetFromIndex(...); dumpFile.seek(offset); line = dumpFile.readLine();` I don't know if that's very helpful, it's totally generic use of file offsets. Basically I have an index I've made of arbitrary [MediaWiki dump files](http://dumps.wikimedia.org/backup-index.html) which can be enormous. I have a suite of tools that create these index files and use them to extract arbitrary info from the dump files without having to parse through many gigabytes of XML. Perl and C up till now. – hippietrail Dec 01 '12 at 01:05
  • 1
    Yeah, that should totally be supported. Give it a swing and a miss and see what happens. Additionally, to do the link you have to make the URL _look_ like a URL, to wit: //dumps.wiki... or http://dumps.wiki.... – jcolebrand Dec 01 '12 at 01:07
  • Damn cut and paste from Google Chrome URL bar (-; – hippietrail Dec 01 '12 at 01:08
  • hitting and then (or command if you're on OSX) should totally copy the http. Not sure how you broke that. Just learn to use the keyboard. Ctrl-L is great for C-c C-v stuff. – jcolebrand Dec 01 '12 at 01:09
  • 1
    @jcolebrand: Yeah I opened a new tab, entered `dumps`, waited for the URL bar to find relevant URLs, used down arrow to get to the right one, then did CTRL+C without actually going to the page. I know Chrome gets the `http://` part when you are actually on the page. I have a talent for finding edge cases (-; – hippietrail Dec 01 '12 at 01:12

2 Answers2

1

Node.js is compiled with _LARGEFILE_SOURCE and _FILE_OFFSET_BITS on all platforms, so internally it should be safe for large file access. (See the common.gypi in the root of the source dir.)

In terms of the libraries, it uses Number for start (and end) options when creating read and write streams (see fs.createReadStream). This means you can address up to position 2^53 through node (as evidenced here: Also relevant: What is JavaScript's highest integer value that a Number can go to without losing precision?) This is visible in the lib/fs.js code.

Community
  • 1
  • 1
Joe
  • 41,484
  • 20
  • 104
  • 125
  • Thanks Joe - you linked to the same relevant prior question that I did by the way (-; – hippietrail Dec 01 '12 at 22:47
  • Ha, I totally didn't notice. Was just trying to find something to cover that part of the answer... – Joe Dec 02 '12 at 14:03
  • Actually `_LARGEFILE_SOURCE` / `_FILE_OFFSET_BITS` don't seem to be mentioned in [`common.gypi`](https://github.com/joyent/node/blob/master/common.gypi) unless there's more than one such file and I'm looking at the wrong one? `\-:` – hippietrail Dec 06 '12 at 22:37
  • 1
    Ah! https://github.com/joyent/node/commit/83e5e20c2c24f440b9212453336e7495402a9ed8#common.gypi They are removed as they are inherited from libuv now. – Joe Dec 07 '12 at 00:29
  • We're a few revs behind, and I was using my local source tree. – Joe Dec 07 '12 at 00:29
  • It seems the hard part so far is the lack of APIs like `fseek()` and `ftell()` in the standard `fs` module ... – hippietrail Dec 18 '12 at 12:19
  • Yeah, you don't do it that way in the current Node. You specify the offset when you createReadStream; each time you need a different portion, you create a new stream with a new `start`. Which might not be that efficient if you require a lot of seeking for small bits around a large file. – Joe Dec 18 '12 at 12:29
  • Despite the build flags, now that I've started porting my project that requires this, I've found that `fs.read(files.d.fd, chunk, 0, 1023, 0x7fffffff, function (err, bytesRead, data)` succeeds but `0x80000000` fails with `EINVAL`. This is with version `v0.6.11` running on 32-bit Windows 7. – hippietrail Dec 24 '12 at 14:53
  • A version of the problem also exists with version `v0.4.8` running on SunOS but in this case it fails silently by seeking to position 0 of the file for any `position` over `0x7fffffff`. (I don't have access to a later version to test under *nix.) – hippietrail Dec 25 '12 at 02:35
  • Interesting. I was looking at 0.8.9 source when answering this. – Joe Dec 25 '12 at 15:22
  • Yes sorry to unvote / unaccept. I was having all kinds of trouble with my code at the time until I tracked down the version issues. I was about to file a bug report. I didn't realize node was still evolving so rapidly and fixing "beginner" issues. I thought I had installed it on Windows very recently. – hippietrail Dec 25 '12 at 18:51
1

It was a little difficult to track down but node.js has only supported 64-bit file offsets since version 0.7.9 (unstable), from the end of May 2012. In stable versions from version 0.8.0, from the end of June 2012.

fs: 64bit offsets for fs calls (Igor Zinkovsky)

On earlier versions failure modes when using larger offsets, failure modes vary from silently seeking to the beginning of the file to throwing an exception with EINVAL.

See the (now closed) bug report:

File offsets over 31 bits are not supported

To check for large file support programatically from node.js code

if (process.version.substring(1).split('.') >= [0,7,9]) {
  // use 64-bit file offsets...
}
hippietrail
  • 15,848
  • 18
  • 99
  • 158