1

In the below code I'm creating two files one in text format and other in binary format. The icons of the files show the same. But the characteristics of both the files are exactly same including the size ,charset (==binary) and stream(octet). Why isn't there a text file? Because if i create a text file explicitly the charset is ASCII.

Compiler version - gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0.

Operating system - Tried on both Ubuntu 18.10 and 19.04.

No messages displayed by compiler.

Command used to examine the files file --mime.

Output by the command for file Text1.txt : Text1.txt: application/octet-stream; charset=binary

Output by the command for file Text1.txt : Binary: application/octet-stream; charset=binary

Output by command od -xa FILENAME is same for both files and is :

0000000 0021
! 0000001

#include<stdio.h>
void main(){

FILE *fp;
FILE *fp2;
int a = 10111110;

fp2 = fopen("Text1.txt","w");
fputc('!',fp2);

fp = fopen("Binary","wb");
fputc('!',fp);

}

Expected output is One File with charset as ASCII and One with Binary, Actual output is both of them with charset as Binary

0ne0rZer0
  • 135
  • 10
  • 3
    Any digital file is "binary". – alk Jul 07 '19 at 17:51
  • 2
    You are compiling and running this on which OS? – alk Jul 07 '19 at 17:53
  • edited for os and compiler details – 0ne0rZer0 Jul 07 '19 at 18:07
  • 2
    why are you saying they are binary ? Because you do not have \c character ? It is not produced except under Windows or if you explicitely write it – bruno Jul 07 '19 at 18:08
  • The command "file --mime" states that both the files have charset of binary, instead of ascii in the supposedly "text" file – 0ne0rZer0 Jul 07 '19 at 18:10
  • Really ? very strange, sorry to say but are you sure you check the rigth files / your program successfully write in them ? The two files must have the same contents (compare with `cmp Text1.txt Binary` command). Do you get the expected result doing `cat Text1.txt ` ? – bruno Jul 07 '19 at 18:13
  • Closely related: [Difference between files written in binary and text mode](https://stackoverflow.com/q/229924/2402272) – John Bollinger Jul 07 '19 at 18:15
  • "Explicitly the charset is ASCII": no, it's not. You are writing a string literal. So, the value uses the `-fexec-charset` passed to or defaulted by the compiler. – Tom Blodget Jul 08 '19 at 17:08

2 Answers2

3

The file command diagnoses the files as binary and not ASCII because you are writing non-ASCII characters to the files due to incorrect use of fputc.

fputc("!",fp2); is incorrect. The first argument to fputc should be an int with a character value. "!" is a string literal, which is an array, which is automatically converted to a pointer to its first character.

GCC warns you about this, saying “warning: passing argument 1 of 'fputc' makes integer from pointer without a cast [-Wint-conversion]”. You apparently ignored the warning. Do not do that. When the compiler warns you about something, pay attention, diagnose the problem, and fix it.

The result is that the pointer is converted to an int, and this int is passed to fputc. That may result in some non-ASCII character being written to the file, which in turn causes the file command to diagnose the file as binary.

To fix this, change the string "!" to a single character '!', so that you pass a single character to fputc, with fputc('!',fp2);.

Additionally, main should not be declared with void main(). Declare it with int main(void) or int main(int argc, char *argv[]) or another implementation-defined manner.

On Unix systems, the resulting files with the corrected code will be identical. Core Unix does not distinguish between text and binary files, except that some applications may use metadata (such as “extended attributes”) to characterize files in various ways. The files resulting from the incorrect code may or may not be identical, because identical string literals in different places may or may not have the same address, so the resulting pointer may or may not have the same value.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • oh, `fputc("!",fp2);` / `fputc("!",fp);` well seen, UV. – bruno Jul 07 '19 at 18:35
  • 1
    @bruno, no: `fputc("!", ...)` vs. `fputc('!', ...)`. – Matthieu Jul 07 '19 at 18:42
  • 1
    @Matthieu I wanted to say the OP used `fputc("!",fp2);` **and** `fputc("!",fp);` (not versus), so both files do not get the '!' as expected. I just wanted to congratulate Eric Postpischil for his good view, nothing more ;-) – bruno Jul 07 '19 at 18:44
  • 2
    Certainly these errors should be corrected, but it seems highly unlikely that the corrected program will produce output files that differ from each other, as the OP seems to expect. It might or might not change whether `file` guesses them to be text files. – John Bollinger Jul 07 '19 at 18:47
  • Even after the changes,as suggested in the answer, both the files did still have the charset as binary. – 0ne0rZer0 Jul 07 '19 at 19:04
  • @0ne0rZer0: Then you should show the exact compiler version, the exact operating system version, any messages displayed by the compiler during compilation, the exact command you use to examine the files, and the exact output of that command. Also show the output of `od -xa Text1.txt` and `od -xa Binary`. – Eric Postpischil Jul 07 '19 at 19:09
  • $od -xa Binary 0000000 0021 ! 0000001 $od -xa Text1.txt 0000000 0021 ! 0000001 – 0ne0rZer0 Jul 07 '19 at 19:16
  • @0ne0rZer0: Update the question with the information requested. Paste the exact and complete text, including the compiler version, the operating system version, any messages displayed by the compiler, the exact command you use to examine the files, the output of that command, and the complete output of `od -xa Text1.txt` and `od -xa Binary`. – Eric Postpischil Jul 07 '19 at 19:38
  • Done! Please check it out, – 0ne0rZer0 Jul 07 '19 at 20:47
  • @0ne0rZer0: The `od` output shows there is only a single character, “!” in the file. You have changed the source code so that the other text is no longer written. With just a single character in the file, the `file` program cannot make a good diagnosis about what the contents are, and it is reporting “ application/octet-stream; charset=binary” as a minimal case. If you put the other output commands back, `file` will report “text/plain; charset=us-ascii” for both files. – Eric Postpischil Jul 07 '19 at 20:56
0

C provides a distinction in principle between binary and text streams. Data traversing a text stream may be subject to implementation-dependent conversions:

Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one- to-one correspondence between the characters in a stream and those in the external representation. Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character. Whether space characters that are written out immediately before a new-line character appear when read in is implementation-defined.

(C2011, 7.21.2/2)

In practice, however, the only conversion you will see for byte-oriented streams on any system you're likely to meet is line terminator conversions on systems (primarily Windows) that use carriage return / newline pairs for line terminators in text files. C text mode streams will convert between that external representation and C's newline-only internal representation.

On Linux and modern BSD-based macOS, however, there isn't even that -- these operating systems make no distinction in practice between text and binary files, and it is not at all surprising that your two mechanisms for producing a file yield identical files.

It is an entirely separate question how an external program that attempts to guess at file types might interpret any given file, especially a very short one. Your chances are better for a file to be detected as text if it contains genuine text in the form of words and sentences.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157