4

I want to store pointers to file's line numbers in an array and later I want to retrieve the specified line from disk. I can not store pointer to line number directly as when I read the file back the memory locations would have changed. So, I am storing the offset from the beginning of the file instead. For storing the offset I am using "uint_64t". However since my file size is 200GB therefore "uint_64t" is not able to represent all the offsets.

I have the following questions:

  1. Other than storing offsets, is there some other way by which I may store pointers to file stored on disk.

  2. Is there some other data structure which I may use (other than uint64_t).

Rose Beck
  • 375
  • 1
  • 11

4 Answers4

9

On POSIX systems, off_t is the standard type for file offsets. It's probably a 64-bit type, though, just like uint64_t, as those can hold values on the order of 2e11 without trouble.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
7

You're wrong. A uint64_t is 64 bits, so you can express offsets in files up to 2^64 bytes = 18.45 exabytes with it. According to Wolfram Alpha, you can compare that to:

  • estimated information content of all human knowledge (as of mid-1999) (~ 12 EB )
  • 180 × purported storage capacity of the character Data in Star Trek: The Next Generation ( 8×10^17 b )

No way your files are that big. :)

thejh
  • 44,854
  • 16
  • 96
  • 107
5

A 64-bit unsigned integer should be plenty large enough to store the byte offset into a 200 GB file.

200 GB = 200 GB * 1024 MB/GB * 1024 KB/MB * 1024 Bytes/KB = 214,748,364,800 Bytes

However, a 64-bit integer has a range:

Low: 0, High: 18,446,744,073,709,551,615

I don't see the issue. You can easily index into every byte of that file. As a matter of fact, you could index into every bit of the file and still have lot's of room for growth!

STLDev
  • 5,950
  • 25
  • 36
  • 1
    Actually, to exceed the 64-bits range, the file would have to be bigger than the storage capacity of any data-center I have ever heard of :) – Matthieu M. Jun 11 '13 at 11:19
0

You can follow some encoding schemes to store an offset value.

Example: A file offset value shall be divided by 2 or 4 and kept in a uint64_t variable. So the value range shall be reduced reasonably.

While reading the data, take the uint64_t variable value and multiply by 2 or 4 (Used earlier) and get the exact offset value.