0

I'm trying to use a sparse file to store sparse array of data, logically I thought the code had no bugs but the unit tests keep failing, after many inspections of code I decided to check the file content after every step and found out the holes were not created, aka: write first element, seek x amount of elements, write 2nd element ends up writing first element then second element in file without any space at all between them.

My simplified code:

    FILE* file = fopen64(fn.c_str(), "ar+b");
    auto const entryPoint = 220; //calculated at runtime, the size of each element is 220 bytes
    auto r = fseeko64(file, entryPoint, SEEK_SET);
    if(r!=0){
        std::cerr << "Error seeking file" << std::endl;
    }

    size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
    if(written != 1) {
        perror("error writing file");
    }
    fclose(file);

The offset is being calculated correctly, current behavior is writing first element, leaving 20 elements empty then writing 22nd element. When inspecting file using hex dumps it shows 2 elements at offset 0 and 220 (directly after first element). unit tests also fail because reading 2nd element actually returns element number 22.

Anyone could explain what is wrong with my code? maybe I misunderstood the concept of holes???

------Edit1------

Here's my full code

Read function:

FILE* file = fopen64(fn.c_str(), "r+b");
if(file == nullptr){
    memset(page->entries, 0, sizeof(page->entries));
    return ;
}
MoveCursor(file, id, sizeof(page->entries));
size_t read = fread(&page->entries[0], sizeof(page->entries), 1, file);
fclose(file);
if(read != 1){ //didn't read a full page.
    memset(page->entries, 0, sizeof(page->entries));
}

Write function:

auto fn = dir.path().string() + std::filesystem::path::preferred_separator + GetFileId(page->pageId);
FILE* file = fopen64(fn.c_str(), "ar+b");
MoveCursor(file, page->pageId, sizeof(page->entries));
size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
if(written != 1) {
    perror("error writing file");
}
fclose(file);


void MoveCursor(FILE* file, TPageId pid, size_t pageMultiplier){
    auto const entryPoint = pid * pageMultiplier;
    auto r = fseeko64(file, entryPoint, SEEK_SET);
    if(r!=0){
        std::cerr << "Error seeking file" << std::endl;
    }
}

And here's a simplified page class:

template<typename TPageId uint32_t EntriesCount>
struct PODPage {
    bool dirtyBit = false;
    TPageId pageId;
    uint32_t entries[EntriesCount];
};

The reason I'm saying it is fseeko problem when writing is because when inspecting file content with xdd it shows data is out of order. Break points in MoveCursor function shows the offset is calculated correctly and manual inspection of file fields shows the offset is set correctly however when writing it doesn't leave a hole.

=============Edit2============

Minimal reproducer, logic goes as: write first chunk of data, seek to position 900, write second chunk of data, then try to read from position 900 and compare to data that was supposed to be there. Each operation opens and closes file which is what happens in my original code, keeping a file open is not allowed. Expected behavior is to create a hole in file, actual behavior is the file is written sequentially without holes.

#include <iostream>
#define _FILE_OFFSET_BITS 64
#define __USE_FILE_OFFSET64 1

#include <stdio.h>
#include <cstring>

int main() {
    uint32_t data[10] = {1,2,3,4,5,6,7,8,9};
    uint32_t data2[10] = {9,8,7,6,5,4,3,2,1};
    {
        FILE* file = fopen64("data", "ar+b");
        if(fwrite(&data[0], sizeof(data), 1, file) !=1) {
            perror("err1");
            return 0;
        }
        fclose(file);
    }
    {
        FILE* file = fopen64("data", "ar+b");
        if (fseeko64(file, 900, SEEK_SET) != 0) {
            perror("err2");
            return 0;
        }
        if(fwrite(&data2[0], sizeof(data2), 1, file) !=1) {
            perror("err3");
            return 0;
        }
        fclose(file);
    }
    {
        FILE* file = fopen64("data", "r+b");
        if (fseeko64(file, 900, SEEK_SET) != 0) {
            perror("err4");
            return 0;
        }
        uint32_t data3[10] = {0};
        if(fread(&data3[0], sizeof(data3), 1, file)!=1) {
            perror("err5");
            return 0;
        }
        fclose(file);
        if (memcmp(&data2[0],&data3[0],sizeof(data))!=0) {
            std::cerr << "err6";
            return 0;
        }
    }
    return 0;
}
T.Aoukar
  • 653
  • 5
  • 19
  • shouldn't you use sizeof(page->entries[0]) in fwrite ? – engf-010 Apr 07 '21 at 01:33
  • Each element itself is an array. That's why I'm using sizeof(page->entries) – T.Aoukar Apr 07 '21 at 01:46
  • 1
    That doesn't sound very logical to me. You should give the definition and type of page->entries. – engf-010 Apr 07 '21 at 01:51
  • if page->entries is an array of arrays then page->entries[0] is an array which happens to be the same address as page->entries. Basically your code writes the entire page->entries. if you had used page->entries[1] you probably got a segfault. – engf-010 Apr 07 '21 at 02:01
  • In short : if you want to write 1 element of an array you must supply the size of the element ,not of the array. – engf-010 Apr 07 '21 at 02:35
  • Sorry for the confusion. Page is a single element, it contains meta data that is calculated at runtime, the persistent part is page.entries the page.entries is just an array of bytes, size is roughly 220 bytes. I'm writing one element (which is a whole page), then offsetting 20 pages, and writing 22nd page. However the file writes page 1 followed by page 22 without empty space between them. Does this make it clear? If not I'll show the definitions after I'm back to the office later. – T.Aoukar Apr 07 '21 at 02:45
  • Can you also add the read part and show were exactly you get unexpected data (the problem may be in reading instead of writing)? Better yet, provide a full program that can be run standalone and reproduces the issue. – Daniel Junglas Apr 07 '21 at 08:06
  • @DanielJunglas sorry for the delay. Added the code. – T.Aoukar Apr 07 '21 at 22:24
  • `Here's my full code Read function:` That's not full code, that's parts of it. `"ar+b"` Is it valid to specify multiple modes in fopen string? Shouldn't it be `a+b`? – KamilCuk Apr 07 '21 at 22:29
  • What other parts are needed from the code? When the offset is sometimes in the middle of the file, is append and binary flags sufficient for this? – T.Aoukar Apr 08 '21 at 00:02
  • It is confusing that you always open the file for reading and writing. You should open it in read-only or write-only mode, depending on what you are going to do. Are you sure MoveCursor succeeds? If the seek fails then your program will write something to `stderr` but will silently continue. Also, are you sure that `fseek` is going to extend the file if you search beyond the end of the file? I don't think it does. Can you check with `ftell` whether the file is at the expected position after calling `fseek`? – Daniel Junglas Apr 09 '21 at 04:31
  • For write only, I have to add read mode otherwise it ends up removing some data from the file, no idea why it happens but adding the read flag worked, in read function it's opened as read only. MoveCursor succeeds, never wrote anything to stderr. As far as I've read around it should create sparse file: https://en.cppreference.com/w/cpp/io/c/fseek I'll check with ftell asap and write back. – T.Aoukar Apr 09 '21 at 05:46
  • @DanielJunglas I've added a simple `assert(entryPoint == ftell(file));` statement at the end of MoveCursor function, it never failed. As for the confusing ar+b, if I change to w+b it erases all previous content of the file, a+b seems to work for now. – T.Aoukar Apr 09 '21 at 06:15
  • Then I'm afraid your only option is to provide here a full but minimal program that reproduces the issue and other people could try running on their machines.You should also clarify what exactly the issue is. In your initial post you wrote the entry at position 220 and then found it back at offset 220, so that is expected. – Daniel Junglas Apr 09 '21 at 07:42
  • @DanielJunglas Added a minimal program that reproduce the problem. – T.Aoukar Apr 11 '21 at 10:24

1 Answers1

1

I think your problem is the same as discussed here:

Summary of the two above: If a file is opened for appending (using "a") then fseek only applies to the read position, not to the write position. The write position will always be at the end of the file.

You can fix this by opening the file with "w" or "w+" instead. Both worked for me with your minimal code example.

Daniel Junglas
  • 5,830
  • 1
  • 5
  • 22
  • I see, but opening file in w or w+ mode will overwrite the previous contents, which is the reason I used `a` flag. After multiple tries of different flags, apparently the best solution was to open file with `r+` flag to be able to edit parts of existing content or seek after `EOF` however it doesn't work if file doesn't exist, in which case the file should be opened in `w+` mode as you mentioned. – T.Aoukar Apr 12 '21 at 22:27