0

I tried to make a program that loads chunks of a large (We're speaking of a few MBs) of file, and searches for a value, and prints its address and value, except my program every few times throws a !myfile , doesn't give the value except a weird symbol (Although I've used 'hex' in cout), the addresses seem to loop sorta, and it doesn't seem to find all the values at all. I've tried for a long time and I gave up, so I'm asking experiences coders out there to find the issue. I should note that I'm trying to find a 32 bit value in this file, but all I could make was a program that checks bytes, i'd require assistance for that too.

#include <iostream>
#include <fstream>
#include <climits>
#include <sstream>
#include <windows.h>
#include <math.h>

using namespace std;

int get_file_size(std::string filename) // path to file
{
    FILE *p_file = NULL;
    p_file = fopen(filename.c_str(),"rb");
    fseek(p_file,0,SEEK_END);
    int size = ftell(p_file);
    fclose(p_file);
    return size;
}

int main( void )
{
    ifstream myfile;

    myfile.open( "file.bin", ios::binary | ios::in );


    char addr_start = 0,
                      addr_end   = 0,
                      temp2      = 0x40000;

    bool found = false;

   cout << "\nEnter address start (Little endian, hex): ";
   cin >> hex >> addr_start;
   cout << "\nEnter address end (Little endian, hex): ";
   cin >> hex >> addr_end;

    unsigned long int fsize = get_file_size("file.bin");
    char buffer[100];

    for(int counter = fsize; counter != 0; counter--)
    {
        myfile.read(buffer,100);
        if(!myfile)
        {
            cout << "\nAn error has occurred. Bytes read: " << myfile.gcount();
            myfile.clear();
        }
        for(int x = 0; x < 100 - 1 ; x++)
        {
            if(buffer[x] >= addr_start && buffer[x] <= addr_end)
                            cout << "Addr: " << (fsize - counter * x) << "  Value: " << hex << buffer[x] << endl;
        }

    }

    myfile.close();
    system("PAUSE"); //Don't worry about its inefficiency
}
  • 1
    1. A couple MBs is not a big file :) 2. You don't need to use the old C API to get the file's size: http://stackoverflow.com/a/5840160/634821 – Violet Giraffe Dec 27 '15 at 22:06
  • 3. You loop the file byte by byte, but read it in chunks 100 bytes long each time. Given that your files are really not that long, I would just process byte by byte, without the 100 bytes buffer. Way simpler that way. – Violet Giraffe Dec 27 '15 at 22:08
  • Is your value 1 byte long? What exactly do you *need* to do? – Violet Giraffe Dec 27 '15 at 22:15
  • The code is meant to grab 32 bits at a time, and compare them with a value. However, I was only able to grab 1 byte at a time, since 'char' = 1 byte, and I can't use anything else because read() requires you to use a char buffer – Паша Датский Dec 27 '15 at 22:40
  • On most platforms, `char` is a signed value. As such, entering a starting value of 127, and an ending value of 129 results in the ending value actually being -127, in two's complement arithmetic, resulting in the if() statement attempting to search for a value greater than or equal to 127, and less than or equal to -127, which is a quite difficult task. Secondly, althrough it's true that read() is defined as reading a char buffer, after it's read, nothing prohibits the char buffer from being converted to some other datatype, say a buffer of longs, or unsigned longs. – Sam Varshavchik Dec 27 '15 at 23:18
  • You open the file. Your function that gets the size opens the same file, **again**. The size function closes the file and returns. Your main program (or the operating system) is now confused: is the file open or closed? Where is the file position pointer? Maybe you should pass the file stream by reference to get the size (or not use a function for 3 lines of code). – Thomas Matthews Dec 27 '15 at 23:29
  • When playing with binary data, you should use **`unsigned`** types. For example, you may want `uint8_t` instead of `char` (note: `char` can be signed, unsigned or `char`). – Thomas Matthews Dec 27 '15 at 23:32
  • You don't need to know the size of the file. You read bytes into a buffer until you hit end of file. A simple `while` statement will suffice. – Thomas Matthews Dec 27 '15 at 23:33
  • @SamVarshavchik: The definition of `char` (as to signed or unsigned) is *compiler* dependent, not platform dependent. I have two compilers that have options for setting the type of `char`. This really confuses the static analyzers, too. – Thomas Matthews Dec 27 '15 at 23:35
  • @Паша Датский, Just because you are reading into a char buffer does not mean that you can't interpret it as an array of 32 bit int's. Also your buffer should be a multiple of 4 to avoid reading part of an int. Given the relatively small size of the files, I'd be tempted to read the entire file into a buffer (known as slurping the file) or alternately only read 4 bytes at a time to simplify your logic. – Ken Clement Dec 27 '15 at 23:42
  • @Ken Clement That's what I've done, but I guess yours work better, instead of chunking the file 100 bytes at a time :) Sam Varshavchik I see the issue now, thanks. – Паша Датский Dec 28 '15 at 10:41

1 Answers1

3

A simple program to search for a 32-bit integer in a binary file:

int main(void)
{
  ifstream data_file("my_file.bin", ios::binary);
  if (!data_file)
  {
    cerr << "Error opening my_file.bin.\n";
    EXIT_FAILURE;
  }
  const uint32_t search_key = 0x12345678U;
  uint32_t value;
  while (data_file.read((char *) &value, sizeof(value))
  {
    if (value == search_key)
    {
      cout << "Found value.\n";
      break;
    }
  }
  return EXIT_SUCCESS;
}

You could augment the performance by reading into a buffer and searching the buffer.

//...
const unsigned int BUFFER_SIZE = 1024;
static uint32_t  buffer[BUFFER_SIZE];
while (data_file.read((char *)&(buffer[0]), sizeof(buffer) / sizeof(uint32_t))
{
  int bytes_read = data_file.gcount();
  if (bytes_read > 0)
  {
    values_read = ((unsigned int) bytes_read) / sizeof(uint32_t);
    for (unsigned int index = 0U; index < values_read; ++index)
    {
       if (buffer[index] == search_key)
       {
         cout << "Value found.\n";
         break;
       }
    }
  }  
}

With the above code, when the read fails, the number of bytes should be checked, and if any bytes were read, the buffer then searched.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • Wow, this is surely interesting. Although you forgot to close a bracket, I'm impressed. It was helpful, really helpful. The first one was pretty easy to catch, but the second one appears to be more difficult to understand, I'll give it a deeper look. Nonetheless, although it does find the values, it has troubles finding them all. A certain value present in the file 31 times has only been found 3 times, perhaps because of the file size being an issue ? ( 56 MBs ) . Thank you for your help. – Паша Датский Dec 28 '15 at 01:24