0

First time asking a question here, I've been usually finding all my answers without needing to post something but today I'm stuck in my small program (I'm not an programmer so I may be doing if wrong). Here is the problem : I'm reading a log file while looking for some keywords, fairly simple. Sometimes, the log file contains lines with lots of control characters (that I don't understand and are of no use to me) and causes my program to stop reading like this :

 Bla bla bla KEYWORD
 Bla Bla [SUB][EM][ACK] (and a lot more)
 Bla Bla KEYWORD"

I read the first keyword but the control characters seem to act be like end of life markers for my loop, hence I never read after that. Here is what I do :

FILE *fpIn = fopen(inFile, "r");
char chaine[100];
char searchKeyword[] = "KEYWORD";

while (!feof(fpIn))
{
    fgets(chaine, 100, fpIn);

    if(strstr(chaine, searchKeyword))
    {
        // do whatever...
    }
}

If anyone can give me a hint on how to avoid those characters in a simple way, I would really appreciate it ! Thank you !

nGz
  • 3
  • 1
  • 4
    Welcome to Stack Overflow! [Please see Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/q/5431941/2173917) – Sourav Ghosh Jan 08 '18 at 11:34
  • Assuming [ASCII](http://en.cppreference.com/w/c/language/ascii), those `"[SUB]"`, `"[EM]"` and `"[ACK]"` would really be the values `0x1a`, `0x19` and `0x06` respectively? – Some programmer dude Jan 08 '18 at 11:37
  • @Someprogrammerdude I believe so, but I don't get any "0x.." when I output the line I just read – nGz Jan 08 '18 at 12:53
  • @nGz 0x1A is just a way of writing the number 26 in hexadecimal. The character has the code value 26. – rici Jan 08 '18 at 13:00

2 Answers2

1

If you are using Windows, the control character 0x1A (Control-Z or SUB) will be treated as an end-of-file indication.

You can avoid that by opening the file in binary mode (using "rb" instead of "r" in the fopen, but then you will find that all of your lines have a \r (0x0D) at the end. (In text mode, line endings are corrected to a single \n.)

rici
  • 234,347
  • 28
  • 237
  • 341
1

Assuming [SUB] is effectively ascii code 0x1A, it used to be the end of (text) file marker in CP/M. For compatibility reasons, it kept this role in MS/DOS, and nobody cares to clean that in recent versions of Windows.

The simplest way to remove this problematic byte is to open the file in binary mode. The \r will not be removed from the end of lines (end of lines are marked with \r\n on Windows and just \n on Linux), but at least 0x1A will not be seen as an end of file.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252