0

I am trying to write a simple structure in hdf5 But I still can't add records to the already created dataset. The task itself is to store a set of bytes of arbitrary length and its size

void WriteStructToFile(vector<vector<char>> fl_stream)
{
    struct data
    {
        int size_array;
        char * value_array;
    };
    int status;
    string  query;
    string FileName = "Fly_stream_file.h5";
    string NameGroup = "Stream";
    string Namedataset = "Inclusions";

 //------------------------------- Created File and Dataset -------------
    query = "CREATE TRUNCATE FILE " + FileName;
    status = HDFql::execute(query.c_str());
    query = "USE FILE " + FileName;
    status = HDFql::execute(query.c_str());
    query = "CREATE GROUP " + NameGroup;
    status = HDFql::execute(query.c_str());
    query = "CREATE TRUNCATE CHUNKED DATASET "+ Namedataset +" AS COMPOUND(size_array AS INT, value_array AS VARCHAR)(UNLIMITED)";
    stringstream scriptst;
    scriptst <<"CREATE TRUNCATE CHUNKED DATASET "<< Namedataset <<" AS COMPOUND("<< "size_array AS INT OFFSET " << offsetof(struct data, size_array)<<" ,"<<" value_array AS VARCHAR OFFSET "<< offsetof(struct data, value_array) <<")(UNLIMITED)"<< " SIZE "<< sizeof(struct data);
    status = HDFql::execute(scriptst);
    //clear request
    scriptst.str(std::string());
    scriptst.clear();

//--------------- Fill Dataset ----------------
    //we simulate the arrival of data in a function 
    
    
    struct data realdata;
    int number = HDFql::variableRegister(&realdata);
    for (int i = 0;i < fl_stream.size();i++)
    {

        realdata.size_array = fl_stream[i].size();
        realdata.value_array = fl_stream[i].data();

        //write data to the dataset
        scriptst << "INSERT INTO " << Namedataset << "(-1) VALUES FROM MEMORY " << number<< " SIZE " << sizeof(struct data) << " OFFSET(" << offsetof(struct data, size_array) << ", "
<< offsetof(struct data,value_array) << ")";
        status = HDFql::execute(scriptst);

        HDFql::execute("ALTER DIMENSION Inclusions TO +1");
        scriptst.str(std::string());
        scriptst.clear();
    }
    HDFql::variableUnregister(&realdata);
status = HDFql::execute("CLOSE FILE");
}

the result of the corrected code

OlegMart
  • 5
  • 2

1 Answers1

0

The script that creates the dataset has a syntax error and lacks the specification of members' offsets and the compound's size. It should be as follows:

sprintf(script, "CREATE TRUNCATE CHUNKED DATASET Inclusions AS COMPOUND(sizear AS INT OFFSET %d, value AS VARCHAR OFFSET %d)(UNLIMITED) SIZE %d", offsetof(struct data, sizear), offsetof(struct data, value), sizeof(struct data));
status = HDFql::execute(script);

Also, the second writing that you do to dataset Inclusions should use an hyperslab/point selection (otherwise, previously written data will be overwritten). Therefore, do the following:

sprintf(script, "INSERT INTO Inclusions(-1) VALUES FROM MEMORY %d SIZE %d OFFSET(%d, %d)", number, sizeof(struct data), offsetof(struct data, sizear), offsetof(struct data, value));
hdfql_execute(script);

Moreover, change the data type of member value of struct data to char *value; instead.

Finally, to make the code simpler, remove the call to function c_str() from string query as you can pass it directly to function HDFql::execute().

SOG
  • 876
  • 6
  • 10
  • Thanks to the dataset it was created . I could not create it with size and value names, I always got -1. Also, I can’t remove from the request `c_str()` because of not executing commands, later I will try through the stringstream. I also have not been able to add a record to the dataset with the second line yet, although `ALTER DIMENSION` is triggered judging by the hdfviewer – OlegMart Dec 02 '21 at 23:26
  • I have updated my answer - your code should work properly now (please note that I have also updated the way one creates dataset `Inclusions`). Strange that you have to call function `c_str()` so that you can pass variable `query` into function `HDFql::execute()` since it accepts `std::string` as a parameter (from the very first release version of the HDFql C++ wrapper). – SOG Dec 03 '21 at 15:57
  • Thanks a lot for the tips. there are a couple of questions that I would like to clarify. – OlegMart Dec 03 '21 at 17:57
  • 1) Registering a variable `int number = HDFql :: variableRegister (& realdata);` how often can i register and unregister? 2) `scripts << "INSERT INTO dsetint(-1) VALUES(" << val <<")"; status = HDFql::execute(skriptst);` If I use a similar code in a loop, is it necessary to generate a request for a new one? or can you use the old one as soon as the values have changed? – OlegMart Dec 03 '21 at 18:06
  • Concerning question #1: there is no limits on how many times you can register a variable and unregister it. Concerning question #2: you just need to register the variable once before the loop starts, populate the variable with values within the loop (so that HDFql can write it into the dataset - I believe this is what you want to do), and to just unregister it after the loop is over. This logic holds true as long as your variable doesn't change address within the loop. – SOG Dec 03 '21 at 19:42
  • You can safely move the line `scriptst << "INSERT INTO " << Namedataset << "(-1) VALUES FROM MEMORY " << number<< " SIZE " << sizeof(struct data) << " OFFSET(" << offsetof(struct data, size_array) << ", " << offsetof(struct data,value_array) << ")";` before the the loop starts (since this line is an invariant) so that your code is more performant. If you do this, you can also remove the lines `scriptst.str(std::string());` and `scriptst.clear();` since these are not needed anymore. – SOG Dec 05 '21 at 12:19
  • 1
    Oh thanks for this remark it will be really helpful. – OlegMart Dec 09 '21 at 12:01
  • There was a question how hdfql finds out the size of the data that needs to be written to the file if the variable in the structure is a pointer. It also surprises me that I could not replace the datatype in the dataset with vartinyint – OlegMart Dec 14 '21 at 12:44
  • Basically, the way HDFql finds out the size of the data to be written into the dataset (or attribute for that matter) is by looking at its size and then write this size (i.e. amount) of data stored in the variable into the dataset. Concerning the VARTINYINT not working as expected, which member of the compound dataset are you referring to? – SOG Dec 16 '21 at 16:46
  • in this question, I meant replacing varchar with VARTINYINT in the structure. after all, their data type is the same 1 byte. Also, I don’t understand if we always transfer the size of the structure to 16 bytes in x64, then how is the varchar string written completely because it exceeds these 16 bytes. I tried to figure it out with `DIRECTLY [SIZE data_size]` but it didn't work – OlegMart Dec 17 '21 at 16:02