2

I want to read from a file backwards - from the end to the beginning . This works , but not only I would like to get the characters from the file , I would like to remove them as I read .

std::fstream fileA;
fileA.seekg(-1, fileA.end);
int size = fileA.tellg();
for (i = 1; i <= size; i++)
{
    fileA.seekg(-i, fileA.end);
    fileA.get(ch);

    std::cout << ch;
}

is there anyway to do this , without copying the content and creating a new file without what I have read ?

Christophe
  • 68,716
  • 7
  • 72
  • 138
Noam
  • 96
  • 1
  • 12
  • "This works": I wonder if you've tried on an utf8 file – Christophe Apr 07 '15 at 15:45
  • Well notepad says it is in UTF 8 without BOM . does that matter ? the codes works for reading ... @Christophe – Noam Apr 07 '15 at 15:47
  • The problem in your approach is multibyte UTF8 characters. Let's take the example of small pi: its UTF8 encoding is 0xCF 0x80. If you write in your output 0x80 0xCF it's an invalid UTF8 sequence. But the same problem occurs under windows for any text: '\n' is encoded in files as 0x0D 0x0A. When reading in text mode, you'll only get '\n' when reading this sequence. But with your approach, you'll first position on 0x0A, wihich will get you '\n', and then you'll position on 0x0D, which will be read as '\n' again (because it's followed by 0x0A). So you'll double every newlines. – Christophe Apr 07 '15 at 16:01
  • have you looked at [this](http://stackoverflow.com/questions/10813930/read-a-file-backwards)? – NathanOliver Apr 07 '15 at 16:02
  • @Christophe: this is pretty easy to work around for UTF-8 or UTF-16 though--you can tell from the value that the 0x80 is part of a multi-byte sequence, and you can tell when you've reached the first byte. It's *much* more difficult to deal with combining diacritics though--when you read one code point, you don't know if it may be preceded by a combining diacritic except by reading the preceding code point. – Jerry Coffin Apr 07 '15 at 16:07
  • @NathanOliver thanks , but it is not what I am looking for . I simply want to extract one character at a time from the end of a file . – Noam Apr 07 '15 at 16:08
  • Related: http://stackoverflow.com/q/9026734/1025391 – moooeeeep Apr 07 '15 at 16:36

2 Answers2

2

This really isn't possible without using one of the methods outlined here or here. If look at an istream_iterator you will see that it is an input iterator (24.6.1)(1)

The class template istream_iterator is an input iterator

Then from (24.2.1)(table 105)

Random Access -> Bidirectional -> Forward -> Input
                                          -> Output

As you can see an input iterator is a more restrictive forward iterator and a forward iterator can only go in one direction. Because of this behavior thier isn't a standard way to start at the end of an input stream and walk backwards

Community
  • 1
  • 1
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • That there is no standard iterator that can do the trick does not mean that there is no standard way to do it. Still an upvote for the two linked questions/answers. Especially the `mmap()` approach is worth considering. – cmaster - reinstate monica Apr 07 '15 at 16:40
1

If you just want to take the binary data an present it in reverse order, regardless of it's meaning, your code is ok.

Some recommendations:

  • You should then open the stream in binary for consistency accross the platform (i.e. avoid that a newline is transformed in double newline on platforms such as windows which encode it as 0x0d,0x0a).

  • You could also consider using in the loop the relative position to the current one, to navigate backwards, instead of always going to the end and repoistion yourself from the absolute position from the end.

Here the fine-tuned code:

ifstream fileA("test.txt", ios::binary);  // binary data because binary revert
fileA.seekg(-1, ios::end); // position on last char to be read 
char ch; 
for (; fileA.get(ch); fileA.seekg(-2, ios::cur))  // try to read and rewind.  
    std::cout << ch;

Your code is however not able to read proper UTF8 encoded files, because the multibyte sequences will be mecanically revereted, and their reverted version is invalid UTF8:

  • This is not an issue if you only have ASCII caracters in your file.
  • If UTF8 consistency is a problem for you, you could consider a very simple workaround: if you read a character u for which (u & 0xC0) == 0x80 , you have to read all the preceding chars until this condition goes false, and output the group of bytes (between 2 and 8) in the correct order.

Here how to do it:

...                           // If UTF-8 must be processed correctly
fileA.seekg(-1, ios::end);
char ch, buft[9]{},*p;
bool mb=false; 
for (; fileA.get(ch); fileA.seekg(-2, ios::cur))
{
    if (mb) {  // if we are already processing a multibyte sequence
        if ((ch & 0xC0) == 0x80 && p!=buft) // still another byte ?
            *--p=ch; 
        else {
            cout <<ch<<p;   // if no other output the current leading char followed by the multibyte encoding that we've identified
            mb=false;      // and multibyte processing is then finished
        }
    }
    else if ((ch & 0xC0) == 0x80) {  // if a new multibyte sequence is identified
        mb =true;      // start its processing
        buft[7]=ch; 
        p=buft+7; 
    }
    else std::cout << ch;  // normal chars ar procesed as before.
}

Here a runnable demo.

Last point: removing the last byte from the input stream is operating system dependent. You should have a look at this SO question to get answers on how to do it on linux/posix and windows.

Community
  • 1
  • 1
Christophe
  • 68,716
  • 7
  • 72
  • 138