6

I've created a text file that has 256 characters, the first character of the text file being ASCII value 0 and the last character of the text value being ASCII value 255. The characters in between increment from 0 to 255 evenly. So character #27 is ASCII value 27. Character #148 should be ASCII value 148.

My goal is to read every character of this text file.

I've tried reading this with cin. I tried cin.get() and cin.read(), both of which are supposed to read unformatted input. But both fail when reading the 26th character. I think when I used an unsigned char, cin said it was reading read in 255, which simply isn't true. And when I used a normal signed char, cin said it was reading in -1. It should be reading in whatever the character equivalent of ASCII 26 is. Perhaps cin thinks it's hit EOF? But I've read on separate StackOverflow posts previously that EOF isn't an actual character that one can write. So I'm lost as to why cin is coughing on character values that represent integer -1 or integer 255. Could someone please tell me what I'm doing wrong, why, and what the best solution is, and why?

There's not much concrete code to paste. I've tried a few different non-working combinations all involving either cin.get() or cin.read() with either char or unsigned char and call casts to char and int in between. I've had no luck with being able to read past the 26th character, except for this:

unsigned char character;

while ( (character = (unsigned char)cin.get()) != EOF) { ... }

Interestingly enough though, although this doesn't stop my while loop at the 26th character, it doesn't move on either. It seems like cin, whether its cin.get() or cin.read() just refuses to advance to the next character the moment it detects something it doesn't like. I'm also aware that something like cin.ignore() exists, but my input isn't predictable; that is, these 256 characters for my text file are just a test case, and the real input is rather random. This is part of a larger homework assignment, but this specific question is not related to the assignment; I"m just stuck on part of the process.

Note: I am reading from the standard input stream, not a specific text file. Still no straightforward solution it seems. I can't believe this hasn't been done on cin before.

Update:

On Windows, it stops after character 26 probably due to that Ctrl-Z thing. I don't care that much for this problem. It only needs to work on Linux.

On Linux, though, it reads all characters from 0 - 127. But it doesn't seem to be reading the extended ASCII characters from 127 to 255. There's a "solution" program that produces output we're supposed to imitate, and that program is able to read all 255 characters somehow.

Question: How, using cin, can I read all 255 ASCII characters?

Solved

Using:

int characterInt;
unsigned char character;

while ( (characterInt = getchar()) != EOF )
{
            // 'character' now stores values from 0 - 255
    character = (unsigned char)(characterInt);
}
Jason
  • 6,878
  • 5
  • 41
  • 55
  • ASCII goes from 0 to 127. Byte values 128 through 255 aren't ASCII, though there are a bazillion of (now-)awful encodings that take 0-127 from ASCII and ursup 128-255 for their own nefarious purposes. –  Feb 21 '13 at 17:57
  • @delnan Why do you say awful? The ISO 8859 encodings were almost universal in Europe, and even now are widespread. (I tend to use UTF-8, but there are still a lot of sites in France and Germany which use ISO 8859-1 or ISO 8859-15. Remember that things like `isalpha` don't work with UTF-8.) – James Kanze Feb 21 '13 at 18:03
  • @JamesKanze I say now-awful because, unlike Unicode encodings, you cannot actually express a lot of characters in any single string using one of those, as they are all incompatible with one another, and it's impossible to differentiate them reliably. I'm well aware that some of them are quite popular (I live in Germany) but that doesn't make them any better, it just makes them legacy cruft. I'm also aware that they were a somewhat reasonable solution at the time they were created. But since one to two decades, they're just a source of encoding mishaps and pain, and inferior to UTF-8. –  Feb 21 '13 at 18:08
  • Even with all these Ctrl-Z or text-mode-not-binary-mode issues, why is `read()` and `get()` failing? – Jason Feb 21 '13 at 18:16
  • Perhaps because you're using `get()` incorrectly. Correct usage is: `int character; while ( (character = cin.get()) != EOF) { ... }` – Robᵩ Feb 21 '13 at 19:33
  • I think that was it, yeah. Also ended up using `getchar()`. – Jason Feb 22 '13 at 08:33

3 Answers3

5

I presume you are on windows. On the windows platform character 26 is ctrl-z which is used in a console to represent end of file, so the iostreams is thinking your file ends at that character.

It onlt does this in text mode which cin is using, if you open a steam in binary mode it won't do this.

jcoder
  • 29,554
  • 19
  • 87
  • 130
  • I am coding this on Windows, though the program is to run on Linux/Unix. – Jason Feb 21 '13 at 17:52
  • @Jason you'll find that it works different on Linux since the runtime library doesn't use this convention. – Mark Ransom Feb 21 '13 at 17:52
  • Yeah, I just ran a diff and the output still isn't ideal, but it seems to be reading in quite a few more characters. I'll have to spend some more time on the details to try to work it out. – Jason Feb 21 '13 at 17:54
3

std::cin reads text streams, not arbitrary binary data.

As to why the 26th character is interesting, you are probably using a CP/M derivative (such as MS-DOS or MS-Windows). In those operating systems, Control-Z is used as an EOF character in text files.


EDIT: On Linux, using g++ 4.4.3, the following program behaves precisely as expected, printing the numbers 0 thru 255, inclusive:
#include <iostream>
#include <iomanip>

int main () {
  int ch;
  while( (ch=std::cin.get()) != std::istream::traits_type::eof() )
    std::cout << ch << " ";
  std::cout << "\n";
}
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • Would this work on the standard input stream? I'm not really reading a specific text file. – Jason Feb 21 '13 at 17:49
  • 1
    This might help: http://stackoverflow.com/questions/7587595/read-binary-data-from-stdcin – Robᵩ Feb 21 '13 at 17:50
  • An interesting history lesson here: http://blogs.msdn.com/b/oldnewthing/archive/2004/03/16/90448.aspx – Mark Ransom Feb 21 '13 at 17:56
  • You should be able to read anything if the file is opened in binary mode. But you can't change the mode once the file is opened, and `std::cin` is opened by the runtime, not by you. – James Kanze Feb 21 '13 at 18:00
  • According to [this post](http://www.parashift.com/c++-faq/binary-mode-for-cin-cout.html), it doesn't seem common. How do most people read binary data from `cin`? Here's [another post](http://www.cpp-home.com/forum/viewtopic.php?f=4&t=15106) where the poster seems to have the exact same problem I'm having. – Jason Feb 21 '13 at 18:03
  • 1
    @Rob, according to that link, a few posters say there is no solution. One poster's "solution" looks pretty mangled and it seems to use new C++ features? The other "solution" is platform-specific to Windows. All I want is to be able to read all my character values and be happy... – Jason Feb 21 '13 at 18:11
  • @Jason - "All I want is to be able to read all my character values and be happy..." - lets make it c++ motto ) – SChepurin Feb 21 '13 at 18:18
1

There are two problems here. The first is that in Windows the default mode for cin is text and not binary, resulting in certain characters being interpreted instead of being input into the program. In particular the 26th character, Ctrl-Z, is being interpreted as end-of-file due to backwards compatibility taken to an extreme.

The other problem is due to the way cin >> works - it skips whitespace. This includes space obviously, but also tab, newline, etc. To read every character from cin you need to use cin.get() or cin.read().

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • I am using `cin.get()` and `cin.read()`. Also, on Unix, it reads up to 127 characters but not more. Any insight into that? – Jason Feb 21 '13 at 20:16
  • @Jason, use `od -b` to make sure the file contains the characters you think it does. I don't have a Unix or Linux system on hand to evaluate, but I did test it with Cygwin on Windows and it works properly. I can edit the code into the answer if it would help. – Mark Ransom Feb 21 '13 at 20:19