2

I am writing a simple program to flip all the bits in a file, but right now it only does the first 1000 bytes until I get that much working. Why does my call to read() ignore \r characters? When I run this code on a file that only contains \r\n\r\n, the read call returns 2 and the buffer contains \n\n. The \r characters are completely ignored. I'm running this on Windows (this wouldn't even be an issue on Linux machines)

Why does read(2) skip over the \r character when it finds it? Or is that what is happening?

EDIT: Conclusion is that windows defaults to opening files in "text" mode as opposed to "binary" mode. For this reason, when calling open, we must specify O_BINARY as the mode.

Thanks, code below.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/stat.h>
#include <fcntl.h>

void invertBytes(size_t amount, char* buffer);

int main(int argv, char** argc)
{
   int fileCount = 1;
   char* fileName;
   int fd = 0;
   size_t bufSize = 1000;
   size_t amountRead = 0;
   char* text;
   int offset = 0;

   if(argv <= 1)
   {
      printf("Usages: encode [filenames...]\n");
      return 0;
   }

   text = (char *)malloc(sizeof(char) * bufSize);

   for(fileCount = 1; fileCount < argv; fileCount++)
   {
      fileName = argc[fileCount];
      fd = open(fileName, O_RDWR);
      printf("fd: %d\n", fd);
      amountRead = read(fd, (void *)text, bufSize);
      printf("Amount read: %d\n", amountRead);
      invertBytes(amountRead, text);
      offset = (int)lseek(fd, 0, SEEK_SET);
      printf("Lseek to %d\n", offset);
      offset = write(fd, text, amountRead);
      printf("write returned %d\n", offset);
      close(fd);
   }

   return 0;
}

void invertBytes(size_t amount, char* buffer)
{
   int byteCount = 0;
   printf("amount: %d\n", amount);
   for(byteCount = 0; byteCount < amount; byteCount++)
   {
      printf("%x, ", buffer[byteCount]);
      buffer[byteCount] = ~buffer[byteCount];
      printf("%x\r\n", buffer[byteCount]);
   }
   printf("byteCount: %d\n", byteCount);
}
Akron
  • 1,413
  • 2
  • 13
  • 28

2 Answers2

4
fd = open(fileName, O_RDWR);

should be

fd = open(fileName, O_RDWR | O_BINARY);

See read() only reads a few bytes from file for details.

Community
  • 1
  • 1
Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
  • Wow, you are right. I changed that one line and it is now correctly processing the \r character. If I compiled this on a Linux machine, would I need to specify the O_BINARY mode? I know unix files do not typically contain \r. – Akron Jan 06 '12 at 20:33
  • 1
    I'd always use O_BINARY for binary files, it's just the polite thing to do and promotes portability for cross compiling with cygwin. – Joachim Isaksson Jan 06 '12 at 20:36
  • 1
    No, on linux there's no difference between binary mode and text(?) mode. But of course for portability, it's better to use O_BINARY anyway. – Daniel Fischer Jan 06 '12 at 20:36
  • @Akron this doesn't have to do with \r at all: Windows uses different modes to read a file as Text or Binary, AFAIK this concept doesn't exist in Linux. So: Yes, you are right, that you don't need O_BINARY on Linux, but no, that is not because of Linux text files using \n as a line delimiter, but because of Linux not using this (IMHO braindead) way to handle a file. – Eugen Rieck Jan 06 '12 at 20:38
2

Try opening with O_BINARY to use binary mode, text mode may be default and may ignore \r.

open(fileName, O_RDWR|O_BINARY);

Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294