2

currently I'm implementing the Burrows-Wheeler transform (and inverse transform) for raw data (like jpg etc.). When testing on normal data like textfiles no problems occur. But when it comes to reading jpg files e.g. it stops reading at character 0x1a aka substitute character. I've been searching through the internet for solutions which doesn't take OS dependend code but without results... I was thinking to read in stdin in binary mode but that isn't quite easy I guess. Is there any simple method to solve this problem?

code:

buffer = (unsigned char*) calloc(block_size+1,sizeof(unsigned char));
length = fread((unsigned char*) buffer, 1, block_size, stdin);
if(length == 0){
    // file is empty
}else{
    b_length = length;
    while(length == b_length){
        buffer[block_size] = '\0';
        encodeBlock(buffer,length);
        length = fread((unsigned char*) buffer, 1, block_size, stdin);      
    }
    if(length != 0){            
        buffer[length] = '\0';
        encodeBlock(buffer,length);
    }
}
free(buffer);
user1745184
  • 93
  • 1
  • 8

5 Answers5

5

As you've noticed, you're reading from stdin in ASCII mode and it is hitting the SUB character (substitute, aka CTRL+Z, aka DOS End-of-File).

You have to change the mode to binary with setmode while on Windows:

#if defined(WIN32)
#include <io.h>
#include <fcntl.h>
#endif /* defined(WIN32) */

/* ... */

#if defined(WIN32)
_setmode(_fileno(stdin), _O_BINARY);
#endif /* defined(WIN32) */

On platforms other than Windows you don't run into this distinction in modes.

user7116
  • 63,008
  • 17
  • 141
  • 172
  • I have exactly the same problem as the OP. Input file has "**^Z**" but when I try to force stdin to read binary. It still gets to eof. I have tried to setmode, but it just doesnt work. Might wanna help? Maybe il post a new question. – eleijonmarck Dec 15 '14 at 14:45
  • 1
    Hello everyone, Wanted to point out that _setmode(_fileno(stdin), _O_BINARY); has to be before everything else in the main() function. – eleijonmarck Jan 29 '15 at 14:12
  • The last two hours was like a twilight zone for me. I tested back and forth both Linux and Windows and suspected it has gotta be in the default open mode of stdin. I thought it was `setvbuf`, i generated random 512 bytes binary via `dd if=/dev/urandom of=bin.dat bs=512 count=1` and notice it always stops at every `0x1A` and that freakin' clue lead me to here. Hair pulling is over. Tthanks. – daparic Aug 18 '20 at 19:42
3

You cannot do this without an OS dependency. The C language specification says (7.19.3)

At program startup, three text streams are predefined...

stdin is a text stream. Depending on your OS, there may be ways to change the mode of an existing stream or access the low-level stream data, but you claim that you do not want any OS-specific code.

Raymond Chen
  • 44,448
  • 11
  • 96
  • 135
  • Well I was hoping that I would find a solution without any OS-specific code. But it has to run on linux os but I'm currently debugging on windows. So how should I solve this problem on linux OS? – user1745184 Oct 17 '12 at 20:46
2

You can use _setmode to convert stdin to binary mode.

There is also freopen -- see this SO question

Community
  • 1
  • 1
Doug Currie
  • 40,708
  • 1
  • 95
  • 119
1

You must open the file as a binary file.

Use something similar to

fopen("file", "rb");
Earlz
  • 62,085
  • 98
  • 303
  • 499
  • stdin is also a file. @Earlz 's advice is correct. (on microsoft "operating systems", at least) – wildplasser Oct 17 '12 at 20:21
  • 1
    On unix the "b"inary flag is silently ignored. 0x1a is not a special character, not even in stdin. – wildplasser Oct 17 '12 at 20:26
  • And as far as I know, stdin is not a file... There is still a difference between a stream and a file. – user1745184 Oct 17 '12 at 20:38
  • @user1745184 as far as fopen and other basic file I/O is concerned, a file IS-A stream. Streams only act different when seeking – Earlz Oct 17 '12 at 21:11
  • @Earlz Not really. The difference between a file and stream is that a stream represents a flow between two directions (for example disk to memory). A file is more likely a representation of something that is stored on a disk e.g. But you are able to assign a stream (like stdin) to what is called a FILE pointer which actually represents information about a file/stream. So generally a stream "streams" out/in data from/to a file. In that case a stream IS-NOT a file. – user1745184 Oct 17 '12 at 22:58
  • @user1745184: the words "FILE" and "stream" are heavily overused. The old unix-adagium "everything is a file" refers to the allmost physical level: inodes and such. The FILE structure is a wrapper around an open filedescriptor, which is a pointer into the system-file-table, which is an "opened" inode. Files/inodes can have some properties, such as is-seekable, or is-mmappable or is-selectable. (but stdin is still a FILE*) The word "stream" is even worse: it can refer to protocol-handlers (sys-V), or special kinds of FILE's (c++). The semantics depend on the domain where the term is used. – wildplasser Oct 18 '12 at 00:35
1

Use read() to read in the data.
Since you are interested in getting data from the stdin, use

fd = fcntl(STDIN_FILENO, F_DUPFD, 0);

to obtain the fd of stdin.

More info here.

The issue has something to do with the fact that windows treats 0x1a a.k.a. CTRL+Z as the EOF. As Earlz pointed out, opening it in binary mode fixes this on windows and works on linux too.

TheCodeArtist
  • 21,479
  • 4
  • 69
  • 130