0

tl;dr: What are necessary control characters that custom file must have in order not to trigger [or trigger] badbit, failbit, badbit, eofbit dependant on OS?


I am working with Cygwin and Notepad++, using Windows 7 on an x64 laptop.

I have an file called genesis.o that I typed in hex editor that is in same directory as sandbox.cpp from which I am doing reading/writing. The file "genesis.o" contains:

genesis.o 
-----------------
58 - 4f - 58 - 4f 
58 - 58 - 58 - 58 
3f - 3f - 3f - 3f 
00 - 00 - 00 - 3f  
00 - 00 - 3f - 00
00 - 3f - 00 - 00
3f - 00 - 00 - 00 
00 - 00 - 00 - 58
00 - 00 - 58 - 00
00 - 58 - 00 - 00  
58 - 00 - 00 - 00 
48 - 45 - 4c - 4c  

So far, I have read and manipulated custom structures without any regard of validity of the file. I used stat to check if file is accessible and that was enough.

This file (with and without extension ".o") passes all check ups I made except file.rdstate in which it always returns 4 (std::ios::failbit).

Error doesn't show itself on any other normal file, so I am guessing that some sort of control character sequence before/after or in file actually tells the std::fstream that file is valid.

Since no other file (except those typed in a hex editor) triggers this behaviour, is there a way to structure an custom binary file to be recognised by fstream? Some sort of control characters, preset flags etc.?

I am using std::ios:in | std::ios:binary. I am reading it by getting stat buffer.st_size -> divide it by 4 (since I read 4 byte integers) and:

uitn32_t temp = 0;
file.read( (char *)(&temp), sizeof(uint32_t) );  

It is notable to mention, that I can read that binary file even if file.rdstate returns an failbit.

Minimal testable example. Just make an "genesis.o" file with character specification above.

#include <fstream>
#include <iostream>
#include <string>

#include <sys\types.h>
#include <sys\stat.h>
#include <vector>

struct handl{
    std::string name = "genesis";
    std::string ext  = "o";
    std::vector<uint32_t> mem;
    bool acces = false;
    struct stat buffer;

    handl():mem(0),name("genesis"),ext("o"){}

    const char *f_name(){
        std::string f_n = this->name;
        f_n.append(".");
        f_n.append(this->ext);
        return f_n.c_str();
    }

    void recheck(){
        this->acces = ( stat(this->f_name(), &this->buffer ) == 0 );
    }

    virtual bool header( std::fstream &file )
    {return true;}
    virtual bool footer( std::fstream &file )
    {return true;}

    void operator()(){ this->recheck(); }
    void operator()( const char *name, const char *ext ){
        this->name = std::string(name);
        this->ext  = std::string(ext);
        this->recheck();
    }

    void prefix( const char* pre ){
        std::string pn(pre);
        pn.append( this->name );
        this->name = pn;
    }

    void suffix( const char *su ){
        this->name.append(su);
        this->recheck();
    }

    int read(){

        this->recheck();
        if( !this->acces ){return 0;}

        std::fstream file;
        file = std::fstream( this->f_name(), std::ios::in | std::ios::binary );

        if( this->header(file) && this->footer(file) ){

                int byte_size = this->buffer.st_size;

                std::cout << file.rdstate() << std::endl;
                std::cout << "gb\t" << std::ios::goodbit << std::endl;
                std::cout << "bb\t" << std::ios::badbit  << std::endl;
                std::cout << "eb\t" << std::ios::eofbit  << std::endl;
                std::cout << "fb\t" << std::ios::failbit << std::endl;

                file.close();
                return 1;
        }else{
            file.close();
            return 0;
        }

    }

};

int main(){

    handl f;

    std::cout << f.read() << std::endl;

    return 0;
}
Danilo
  • 1,017
  • 13
  • 32
  • 1
    Can you tell us what mode you are opening it in and how you are reading it? My initial (unverified) suspicion is likely related to binary vs text mode. – nanofarad Aug 15 '19 at 13:45
  • 1
    Thanks. I don't have enough insight to directly assist but I'm certain that this info will help clarify for someone more experienced with C++ file I/O. – nanofarad Aug 15 '19 at 13:54
  • Do you want an enumeration? Or can you narrow down which character specifically causes this? Shouldn't take long, and then you'll probably be able to find the answer quite quickly – Lightness Races in Orbit Aug 15 '19 at 13:59
  • Enumeration ? No i don't need an enumeration about control characters, wikipedia does it just fine. Maybe it would be clearer when i mention png files. They have an `valid load` section that checks if file is read in text or binary mode. I wager this isn't the first time something like this happened, and people definitely have some sequence ( as valid load ) that circumvents this behaviour. – Danilo Aug 15 '19 at 14:03
  • 1
    You do know that [`failbit` is also set on EOF](https://stackoverflow.com/questions/6781545/why-is-failbit-set-when-eof-is-found-on-read), right? Could that be it? – rustyx Aug 15 '19 at 14:04
  • Yeah, i've tried EOT/EOF input in (HELL portion ) and it still didn't work. Which kind of make since binary information sometimes uses EOT/EOF or NULL - so some sort of unique sequence is better for recognition than random position of EOT/EOF. If I complicated it too much in wording : Sometimes EOF can be product of some information stored in a file ( 04-5f ) as a product of checksum or something else. To much risk of terminating an file based on randomness of data. – Danilo Aug 15 '19 at 14:08
  • Probably a bug in Cygwin: compiler, standard libraries, etc. It’s not particularly reliable. If you want to build for Linux, either upgrade to Win10 and use WSL, or use a Linux VM. If you don’t care about platform, switch to the native toolchain: install VS2017, the community edition is freeware and at least on paper it supports Windows 7 SP1. – Soonts Aug 15 '19 at 14:31
  • Currently I am on hacking stage, so I just need it to work. Why, how and cross OS platform will be important later on. Just thought of something, could this be an issue with file recognition ( i don't know which process/driver handles this )? Like some sort of function that checks magic of an file with some sort of collection , and if magic is found in some sort of table, it recognises it as a file, but if it isn't found it automatically results in failbit ? So perhaps it is mater of registration than reading ... – Danilo Aug 15 '19 at 14:36
  • 1
    A [mcve] would go a long way... right now no-one can answer this question. – rustyx Aug 15 '19 at 14:48
  • I just edited the question. Make your own "`genesis.o`" file with specification since i can't upload the file here. – Danilo Aug 15 '19 at 15:24
  • 1
    Use forward slashes in includes, like ``. It'll help if you want to compile it in a posix environment later. – Ted Lyngmo Aug 15 '19 at 17:15

2 Answers2

3
return f_n.c_str();

Your local f_n object is destroyed and the pointer to its internal memory is dangling. Using it is undefined behaviour. File content is irrelevant.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • This whole mystery could have been solved in one call to `file.is_open()`. The behaviour is of course still undefined, but there are *high chances* to observe this evaluating to `false`, – n. m. could be an AI Aug 15 '19 at 16:47
0

I tried your code, and this line: if( !this->acces ){return 0;} is failing for me. By "failing", I mean this is causing the read function to return 0.

Marshall Clow
  • 15,972
  • 2
  • 29
  • 45
  • Weird ... did you make "genesis.o" file in same directory as this test code ? I am asking since stat returns true if that is the case. – Danilo Aug 15 '19 at 17:03
  • Yes, I did. The file is getting correctly opened, too. I added `if (!file) { std::cout << "Unable to open file" << std::endl;}` and that doesn't print anything. – Marshall Clow Aug 15 '19 at 17:11