Python vs Perl and byte count correctness

Question

The output I get from wc when trying to calculate a byte count on a string, differs from python and perl by one byte.

Why is that?

Is this problem exclusive to chars or can this arise in other types?

If so, is there a known offset table for each type?

$ python -c 'print("A")' | wc -c
2
$ python -c 'print("A" * 50)' | wc -c
51

$ perl -e 'print "A"' | wc -c
1
$ perl -e 'print "A" x 50' | wc -c
50

score 3 · Answer 1 · answered Apr 17 '20 at 13:50

3

Python print "..." is essentially the same as Perl print "...\n", i.e. Python adds a newline by its own, Perl not (Perl say does though).

answered Apr 17 '20 at 13:50

Steffen Ullrich

114,247
10
131
172

Thank you sir, just for clarification as this offset of one byte is consistant, `python -c 'h = {"A": 12, "B": 12}; print(h)' | wc -c` returns 19, then I should just subtract 1 to get the proper count ? – midastown Apr 17 '20 at 14:07
https://stackoverflow.com/questions/493386/how-to-print-without-newline-or-space might be useful for you – AKHolland Apr 17 '20 at 14:09
1

I would think the appropriate thing to do would either be to not emit a newline in the Python code, or add a newline to the Perl code output, or trim a final newline if present, and then do a byte count. In your specific example subtracting one from the Python byte count would be correct, but would expose you to error if the code were ever updated to not emit a trailing newline. – DavidO Apr 17 '20 at 14:41
@mehdi: depending on the OS (Unix, MacOS, Windows) the output separator might be one or two bytes (`\n` vs `\r\n`). So it is better to make sure the output is exactly the same instead of assuming what the difference will be. See the (already mentioned) question [How to print without newline or space?](https://stackoverflow.com/questions/493386/how-to-print-without-newline-or-space) on how to print only the actual content in Python. – Steffen Ullrich Apr 17 '20 at 15:34

brian d foy · Accepted Answer · 2020-04-18T18:22:36.030

Perl and Python choose different defaults for the output record separator. You can see the extra newline when you look at the output as octets:

$ python -c 'print("A")' | hexdump
0000000 41 0a
0000002

$ perl -e 'print "A"'  | hexdump
0000000 41
0000001

That's not the only way that Perl is different. Python also adds spaces between arguments to print whereas Perl does not. Ruby's puts adds a newline between arguments:

$ python -c 'print("A", "B")' | hexdump
0000000 41 20 42 0a
0000004

$ perl -e 'print "A", "B"'  | hexdump
0000000 41 42
0000002

$ ruby -e 'puts( "A", "B" )' | hexdump
0000000 41 0a 42 0a
0000004

Perl can add the newline for you. On the command line, the -l switch does that automatically for print (but not printf). Inside the code, say does that, but still not adding any characters between arguments. The -E is like -e but enables new features since v5.10, of which say is one:

$ perl -le 'printf "%s%s", "A", "B"'  | hexdump
0000000 41 42
0000002

$ perl -le 'print "A", "B"'  | hexdump
0000000 41 42 0a
0000003

$ perl -lE 'say "A", "B"'  | hexdump
0000000 41 42 0a
0000003

When you decompile one of these, you can see that Perl is merely setting the output record separator, $\ for you, which you can do yourself using a global variable:

$ perl -MO=Deparse -le 'print "A", "B"'
BEGIN { $/ = "\n"; $\ = "\n"; }
print 'A', 'B';
-e syntax OK

But, you can set the output record separator yourself too:

$ perl -e '$\ = "\n"; print "A", "B"'  | hexdump
0000000 41 42 0a
0000003

Perl controls the characters between arguments to print and say with the $, variable, so you can set that:

$ perl -lE '$, = " "; say "A", "B"'  | hexdump
0000000 41 20 42 0a
0000004

In Python you go in the opposite direction because it has a different defaults. This is for Python 3:

$ python -c 'print("A", "B", sep="", end="")' | hexdump
0000000 41 42
0000002

Python vs Perl and byte count correctness

2 Answers2