3

I'm working on Ubuntu 16.04 (Xenial Xerus). I found out that text editors write additional bytes (UTF-8) to the text file. It made some problems for me, when I tried to pass tests.

So we have a string, "Extra byte", with the size = 10 bytes in UTF-8. When I try to write it in file by gedit, for example, I get a file with the size = 11 byte. Also, nano makes the same size. Even "echo "Extra byte" > filename" returns 11 bytes.

However, when we try something like this:

#include <fstream>

int main(){
    std::ofstream file("filename");

    file<<"Extra byte";
    return 0;
}

or this:

with open("filename_py",'w+',encoding='UTF-8') as file:
    file.write('Extra byte')

We get the file with size = 10 bytes. Why?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sklert
  • 242
  • 5
  • 12

2 Answers2

5

You are seeing a newline character (often expressed in programming languages as \n, in ASCII it is hex 0a, decimal 10):

$ echo 'foo' > /tmp/test.txt
$ xxd /tmp/test.txt
00000000: 666f 6f0a                                foo.

The hex-dump tool xxd shows that the file consists of 4 bytes, hex 66 (ASCII lowercase f), two times hex 65 (lowercase letter o) and the newline.

You can use the -n command-line switch to disable adding the newline:

$ echo -n 'foo' > /tmp/test.txt
$ xxd /tmp/test.txt
00000000: 666f 6f                                  foo

or you can use printf instead (which is more POSIX compliant):

$ printf 'foo' > /tmp/test.txt
$ xxd /tmp/test.txt
00000000: 666f 6f                                  foo

Also see 'echo' without newline in a shell script.

Most text editors will also add a newline to the end of a file; how to prevent this depends on the exact editor (often you can just use delete at the end of the file before saving). There are also various command-line options to remove the newline after the fact, see How can I delete a newline if it is the last character in a file?.

Text editors generally add a newline because they deal with text lines, and the POSIX standard defines that text lines end with a newline:

3.206 Line
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

Also see Why should text files end with a newline?

Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

Many editors including gedit and nano have a feature that adds a newline character at the end of the file. std::ofstream doesn't have such feature, because it is used for writing non-text files as well as text files.

The feature exists because as defined by POSIX, a text file consists of lines, and by definition, a line terminates with a newline character.

3.206 Line

A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

3.403 Text File

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

Community
  • 1
  • 1
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • In Gedit it is possible to display the blank line at the end of the file by setting the configuration `ensure-trailing-newline` to false with the command: `gsettings set org.gnome.gedit.preferences.editor ensure-trailing-newline false` – Yogev Neumann Jul 28 '22 at 02:28