3

Quoting from the HDF5 Hyperslab doc -:

The block array determines the size of the element block selected from the dataspace.

The example shows in a 2x2 dataset having the parameters set to the following-:

start offset is specified as [1,1], stride is [4,4], count is [3,7], and block is [2,2]

will result in 21 2x2 blocks. Where the selections will be (1,1), (5,1), (9,1), (1,5), (5,5) I can understand that because the starting point is (1,1) the selection starts at that point, also since the stride is (4,4) it moves 4 in each dimension, and the count is (3,7) it increments 3 times 4 in direction X and 7 times 4 in direction Y ie. in its corresponding dimension.

But what I don't understand is what is block size doing ? Does it mean that I will get 21 2x2 dimensional blocks ? That means each block contains 4 elements, but the count is already set in 3 in 1 dimension so how will that be possible ?

Community
  • 1
  • 1
ng.newbie
  • 2,807
  • 3
  • 23
  • 57

2 Answers2

2

A hyperslab selection created through H5Sselect_hypserslab() lets you create a region defined by a repeating block of elements.

This is described in section 7.4.2.2 of the HDF5 users guide found here (scroll down a bit to 7.4.2.2). The H5Sselect_hyperslab() reference manual entry might also be helpful.

Here is a diagram from the UG:

enter image description here

And here are the values used in that figure:

  • offset = (0,1)
  • stride = (4,3)
  • count = (2,4)
  • block = (3,2)

Notice how the repeating unit is a 3x2 element block. So yes, you will get 21 2x2 blocks in your case. There will be a grid of three blocks in one dimension and seven in the other, each spaced 4 elements apart in each direction. The first block will be offset by 1,1.

The most confusing thing about this API call is that three of the parameters have elements as their units, while count has blocks as its unit.

Edit: Perhaps this will make how block and count are used more obvious...

enter image description here

Dana Robinson
  • 4,304
  • 8
  • 33
  • 41
  • Then what is the difference between count and block ? – ng.newbie Oct 07 '17 at 17:36
  • If you are looking at the lower region of the figure that represents dataset storage (ignore the upper part, which represents memory and ignore the numbers), the block array determines the size of a single gray region (in elements). The count array indicates how many copies of the block exist in each dimension. – Dana Robinson Oct 08 '17 at 04:59
  • Okay for example imagine a 2x2 dataset if I make the block size as 3x2 will that result in an error ? – ng.newbie Oct 09 '17 at 09:01
  • Yes, you will get an error. The dataspace selection cannot exceed the bounds of the dataset. If you try to write to such a selection, you will get a "file selection+offset not within dataset" error. – Dana Robinson Oct 09 '17 at 11:28
-3

HDFS default block size is 64 mb which can be increased according to our requirements.1 mapper processes 1 block at a time.