Find a number in a binary file, rotate and overwrite

Question

I'm trying to find an integer (6664) in a binary file (file.bin) and I must shift this integer to the right once and write this result to the same position the integer was found. However, I am not able to overwrite the original value with the shifted value. I developed the code below:

#include <cstdio>
#include <cstring>
int main(){

    
    //opening file
    FILE *pfile
    pfile = fopen("file.bin","r+b");

    //ffinding file size
    fseek(pfile, 0L, SEEK_END);
    long size= ftell(pfile);
    rewind(pfile);
    //reading data
    int vec[size];
    fread(vec, sizeof(int), size, pfile);
    
    for(int i = 0; i<size; i++){
         if(vec[i]==6664){
             int aux = vec[i]>>1;
             fseek(pfile, i, SEEK_SET);
             fwrite(&aux, sizeof(int), 1, pfile);
            
     }
        }
        
    
    
    return 0;
}

I generated the file with the following command:

echo "0000000: 6408 0623 77ef bfbd efbf bdef bfbd 2779 d..#w.........'y
0000010: efbf bdef 081a 0000 0000 efbf bdef bfbd ................
0000020: 4577 efbf bdef bfbd efbf bd00 Ew.........." | xxd -r > file.bin

Any thoughts?

`fseek()` doesn't know that you are looking at `int`s. It expects the offset in bytes. — G. Sliepen, Jun 15 '21 at 21:54
`size` also is the size in bytes, yet you're asking for that many `int` values. that read op is going to fall woefully short of fulfilling a request to read `size` number of `int` values, but you never knew that because you never bothered to reap the result of the read. — WhozCraig, Jun 15 '21 at 21:58
Note that `int vec[size];` is illegal in C++. Some compilers provide support for as an *extension*, but you shouldn't rely on. At very least you create unportable code that way. — Aconcagua, Jun 15 '21 at 22:05
Why are you coding what essentially is plain C in C++? Please invest in [some good C++ books](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list) to learn how to use C++ properly. And take a few classes. — Some programmer dude, Jun 15 '21 at 22:07
With multibyte integers, you got to ask: Little Endian or Big Endian? Is the first byte the most significant or the least significant? It may effect your outcome. — Thomas Matthews, Jun 15 '21 at 22:19
*"I am not able to complete what I wanted."* -- could you be more specific? Your task has several parts. Were you able to generate the file you intended to generate? Were you able to find the integer? Were you able to do the bit shift? Were you able to write this result? Were you able to write at the desired position in the file? Please focus on one specific issue. — JaMiT, Jun 15 '21 at 22:27
Are you talking about finding 0x6664 or decimal 6664. I don't see 0x6664 or 0x6466 in your input example. — Thomas Matthews, Jun 15 '21 at 22:27
@ThomasMatthews: There's a decimal integer literal in the question so it would seem to be decimal `6664` not `0x6664`. 6664 is `0x1A08` which does appear in the file, byte order is hard to determine because of xxd unusual formatting. — Ben Voigt, Jun 15 '21 at 22:42
@BenVoigt I was not sure because the data in the input is hexadecimal without any prefix. — Thomas Matthews, Jun 15 '21 at 22:55
@ThomasMatthews C++ rules apply in the C++ code, `xxd` rules apply in the data piped to `xxd` — Ben Voigt, Jun 15 '21 at 22:56
@JulyH are you dealing with 16bit integers or 32bit integers? — Remy Lebeau, Jun 15 '21 at 23:26

score 0 · Answer 1 · answered Jun 15 '21 at 22:42

IMHO, you should read in chunks. There is no guarantee that your platform or executable has enough memory to read in the file (files can be huge).

static const size_t QUANTITY_INTEGERS = 1024u * 1024u;
uint16_t buffer[QUANTITY_INTEGERS] = {0};
std::ifstream number_file("file.bin", ios::binary);
while (number_file.read((char *) &buffer[0], QUANTITY_INTEGERS * sizeof(uint16_t)))
{
    unsigned int numbers_read = number_file.gcount();
    uint16_t * p_number = std::find(&buffer[0], &buffer[numbers_read], 0x6664);
    if (p_number != &buffer[numbers_read])
    {
         std::cout << "0x6664 found.\n";
         break;
    }
}

You can use whatever buffer capacity you want.

`read()` operates on bytes, so after each successful `read()`, `gcount()` will return the number of bytes read. To get the number of whole integers read, you need to divide the `gcount` value by `sizeof(uint16_t)` — Remy Lebeau, Jun 15 '21 at 23:20

Remy Lebeau · Answer 2 · 2021-06-15T23:35:58.387

Aside from the complete lack of error handling, you are not adequately differentiating between bytes and integers.

Your size variable is expressed in bytes, but you are using it as a count when dealing with your vec array. As such, you are over-allocating your array, reading too many ints from the file into the array, and iterating through too many elements of the array.

Also, when writing back to the file, you are treating the loop counter i as a byte offset, which it is not, thus you are writing to the wrong offset in the file.

Try something more like this instead:

#include <cstdio>
#include <cstdint>
#include <vector>

typedef int32_t myint_t; // or int16_t, if needed...

int main()
{
    //opening file
    FILE *pfile = fopen("file.bin", "r+b");
    if (!pFile) return -1

    //finding file size
    //TODO: use f/stat() or std::filesystem::file_size() instead
    if (fseek(pfile, 0L, SEEK_END) < 0) {
        fclose(pFile);
        return -1;
    }
    long size = ftell(pfile);
    if (size < 0) {
        fclose(pFile);
        return -1;
    }
    rewind(pfile);

    //reading data
    std::vector<myint_t> vec(size / sizeof(myint_t));
    size_t numItems = fread(vec.data(), sizeof(myint_t), vec.size(), pfile);
    
    for(size_t i = 0; i < numItems; ++i) {
        if (vec[i] == 6664) {
            vec[i] >>= 1;
            // TODO: it is more efficient for the filesystem to make
            // smaller relative seeks using SEEK_CUR than to make
            // larger seeks from the beginning using SEEK_SET...
            if (fseek(pfile, i * sizeof(myint_t), SEEK_SET) < 0) {
                fclose(pFile);
                return -1;
            }
            if (fwrite(&vec[i], sizeof(myint_t), 1, pfile) != 1) {
                fclose(pFile);
                return -1;
            }
        }
    }
        
    fclose(pfile);

    return 0;
}

That being said, consider using C++-style file I/O instead of C-style file I/O, eg:

#include <iostream>
#include <vector>
#include <cstdint>

using myint_t = int32_t; // or int16_t, if needed...

int main()
{
    std::fstream fs;
    fs.exceptions(std::fstream::failbit | std::fstream::badbit);

    try {
        //opening file
        fs.open("file.bin", std::ios::in | std::ios::out | std::ios::binary | std::ios::ate);

        //finding file size
        //TODO: use f/stat() or std::filesystem::file_size() instead
        long size = fs.tellg();
        fs.seekg(0);

        //reading data
        std::vector<myint_t> vec(size / sizeof(myint_t));
        fs.read(reinterpret_cast<char*>(vec.data()), sizeof(myint_t) * vec.size());
        size_t numItems = fs.gcount() / sizeof(myint_t);

        for(size_t i = 0; i < numItems; ++i) {
            if (vec[i] == 6664) {
                vec[i] >>= 1;
                // TODO: it is more efficient for the filesystem to make
                // smaller relative seeks using std::ios::cur than to make
                // larger seeks from the beginning...
                fs.seekp(i * sizeof(myint_t));
                fs.write(reinterpret_cast<char*>(&vec[i]), sizeof(myint_t));
            }
        }
    }
    catch(...) {
        return -1;
    }

    return 0;
}

Though, the following would be a little bit simpler and safer:

#include <iostream>
#include <vector>
#include <filesystem>
#include <cstdint>

using myint_t = int32_t; // or int16_t, if needed...

int main()
{
    std::ifstream ifs;
    std::ofstream ofs;

    ifs.exceptions(std::ifstream::failbit | std::ifstream::badbit);
    ofs.exceptions(std::ofstream::failbit | std::ofstream::badbit);

    try {
        //opening files
        ifs.open("file.bin", std::ios::binary);
        ofs.open("file.new", std::ios::binary);

        //reading data
        myint_t value;
        bool found = false;
        while (ifs.read(reinterpret_cast<char*>(&value), sizeof(value))) {
            if (value == 6664) {
                value >>= 1;
                found = true;
            }
            ofs.write(reinterpret_cast<char*>(&value), sizeof(value));
        }

        ifs.close();
        ofs.close();

        if (found) {
            std::filesystem::rename("file.bin", "file.bak");
            std::filesystem::rename("file.new", "file.bin");
            std::filesystem::remove("file.bak");
        }
        else {
            std::filesystem::remove("file.new");
        }
    }
    catch(...) {
        return -1;
    }

    return 0;
}

Find a number in a binary file, rotate and overwrite

2 Answers2