-1

I have a huge hdf5 file (~100GB, contiguous storage) that I need random access to different points. Using indexing in python/h5py or C/H5Dread seems to be very slow, thus I want to directly mmap the data.

In fact, this works in h5py/numpy on my local 64 bit Fedora 25, following this. But on a remote cluster, numpy/mmap fails for large files ([Errno 12] Cannot allocate memory), even though the python seems to be 64 bit and simple test of 100GB files with mmap in C works. So there might be something wrong with my cluster's Python.

One solution I see is to use mmap in C. I wrote a small test to create a small hdf5 with a 1d dataset and get the dataset offset using `H5Dget_offset'. However, the results are not correct.

Following are the core codes:

/* Get dataset offset within file */
file_id = H5Fopen (FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
dataset_id = H5Dopen2(file_id, "/dset", H5P_DEFAULT);
offset = H5Dget_offset(dataset_id);

fd = open(FILE, O_RDONLY);
// align with page size
pa_offset = offset & ~(sysconf(_SC_PAGE_SIZE) - 1);
length = NX * NY * sizeof(int);
addr = mmap(NULL, length + offset - pa_offset, PROT_READ,
          MAP_PRIVATE, fd, pa_offset);

Discussions under this blog mentioned the implementation in Julia to achieve this through H5Fget_vfd_handle and H5Dget_offset, but I haven't found a detailed/easy explanation.

  • The offset I got through python/h5py's dataset.id.get_offset is identical to that I got through H5Dget_offset in C.
  • I think my core question is: how to use the offset given by C's H5Dget_offset to mmap the dataset.
  • Should mmap be much faster than naive hdf5 access in the first place?
too honest for this site
  • 12,050
  • 4
  • 30
  • 52
Liang
  • 1,015
  • 1
  • 10
  • 13
  • Why memory mapping instead of just seeking and reading? – tadman Oct 08 '17 at 21:25
  • @tadman , by "seeking and reading" do you mean direct indexing? I tried that, but that was slow. I don't know if the indexing incurs some extra overhead? For me, each time, I only randomly access one data point and keeps cycling, instead of slicing, which is the typical situation of hdf5 usage. – Liang Oct 08 '17 at 22:00
  • If you open the file in [unbuffered mode](https://stackoverflow.com/questions/20342772/buffered-and-unbuffered-inputs-in-c) then you have pretty direct, raw, low-level access to the file. Using `fseek`/`fread` you can get data out of any spot you want, random access. By default file reads are buffered which can be a drag on performance unless you're doing linear reads. – tadman Oct 08 '17 at 22:03
  • Thank you @tadman . I will try the unbuffered mode. However, I feel my problem is that the offset returned from `H5Dget_offset` is not exactly the offset to the beginning of the actual data space from the head of file I opened through `open` (the file descriptor `fd` in the code). Maybe this offset has to be used with the address returned by `H5Fget_vfd_handle`, but then I haven't figured out how to use that address (which is some address intead of a file descriptor, as I see). Using `fseek`/`fread` would still need the correct offset, anyway. – Liang Oct 08 '17 at 22:13
  • I'm not sure of what's going on inside that file, but it's possible there's a header or some framing that's not supposed to be counted as part of the offset. In many binary file formats there's nested structures that maintain their own independent offsets. – tadman Oct 08 '17 at 22:14
  • Why are you using `pa_offset` as the `offset` argument to the `mmap` call? This means you're probably not actually using the value given to you by `H5Dget_offset`! – bnaecker Oct 08 '17 at 22:47
  • The offset has to be aligned with page size, according to [this](https://www.gnu.org/software/libc/manual/html_node/Memory_002dmapped-I_002fO.html). The fix is to shift the returned pointer back again. – Liang Oct 08 '17 at 23:38

2 Answers2

0

The main cause of your problem has nothing to do with the HDF library. You're not mapping the bytes that the HDF library is telling you correspond to the dataset.

H5Dget_offset returns the offset, in bytes, from the start of the file to the beginning of the dataset in question. But you're not passing that value to mmap(2). You're computing the multiple of the page size just below the actual offset, and then using that as your offset into the file in your mmap(2) call.

Instead of:

mmap(..., pa_offset);

you should have

mmap(..., offset);

As to whether this will be any faster. The HDF library is complex. There is likely to be good bit of overhead (bounds checks, permission checks, other library calls), but it's also likely to be fairly well-optimized. The only reasonable way to decide if memory-mapping is faster is to measure it.

bnaecker
  • 6,152
  • 1
  • 20
  • 33
  • I think the `mmap` itself still needs `pa_offset` per Posix specification. The bug is that the returned pointer should shift back, e.g., by `int * ptr = (int *) (addr + offset - pa_offset);`. In fact, using `offset` directly in `mmap` gives segmentation faults in my test. – Liang Oct 08 '17 at 23:35
  • @Liang Wow, I honestly did not know that the `offset` had to be a pagesize multiple. I guess the implementations I've worked with have been less than POSIX compliant! Your posted solution does the same thing I was trying to accomplish, but compliantly. – bnaecker Oct 08 '17 at 23:54
  • Right. Interestingly, in my test with tiny data (15 ints) and thus probably tiny overhead, the `pa_offset` is zero, since the real `offset` is too small compared to page size. – Liang Oct 09 '17 at 00:00
0

This is my own answer to the question.

Following this HDS implementation, I figured out a bug in my original code, but the solution is different from @bnaecker's.

Basically, the original mmap still requires pa_offset, per mmap doc. But the returned pointer should then be shifted back, for example,

int * ptr = (int *) (addr + offset - pa_offset);

For later users' reference, I paste the core code for three types of accessing methods here. Particularly, the setvbuf trick mentioned by tadman might further improve the performance of random access (not tested, though).

FILE *fp;

/* Get dataset offset within file */
file_id = H5Fopen(FNAME, H5F_ACC_RDONLY, H5P_DEFAULT);
status = H5Fget_vfd_handle(file_id, H5P_DEFAULT, &fhandle);
dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
offset = H5Dget_offset(dataset_id);

/* Read through stdio */
fp = fopen(FNAME, "rb");
fseek(fp, offset, SEEK_SET);
int x0[NX*NY];
fread(&x0, sizeof(int), NX*NY, fp);

/* Get the file descriptor */
fd = *((int *)fhandle);

/* Read through Posix */
int x1[NX*NY];
lseek(fd, offset, SEEK_SET);
read(fd, x1, NX*NY*sizeof(int));

/* Read through mmap */
// page size-aligned offset for mmap
pa_offset = offset & ~(sysconf(_SC_PAGE_SIZE) - 1);
length = NX * NY * sizeof(int);
addr = mmap(NULL, length + offset - pa_offset, PROT_READ,
          MAP_PRIVATE, fd, pa_offset);
// revert the align for correct access
int * x2 = (int *) (addr + offset - pa_offset);
Liang
  • 1,015
  • 1
  • 10
  • 13