The right strategy for printing formatted strings from a single Python script, in both Python 2 and 3?

Question

I know I'm quite late with all this, but I have a relatively simple command-line Python script, written for 2.7, which I'd like to make usable on both Python 2.7+ and Python 3+. Since it's a single script:

I do not want to use six - while six is just a single file, now I'd have to take care of two files (the six module and my script), instead of one
I do not want to use 2to3; because then again I'd have to take care of two files (the 2.7 version of my script and the 3.2 version of it), instead of one

So, I thought the best approach for me would be to write Python 2.x as much compatible with Python 3.x as possible; then I could code once, and not worry if I have to run the script on a USB-thumbdrive OS, which may only have Python 2.7 (or for that matter, only Python 3+), and which I may have trouble finding and/or installing the right version of Python for.

To demonstrate my problems, here is a sample script based on examples in Learning Python -- Sample chapter 9: Common Tasks in Python - and the preparation in bash on Ubuntu 11.04 (with a bit of Unicode, to spice it up):

cd /tmp

mkdir /tmp/ptest
echo 'Байхъусут, зæрæдтæ!.. Байхъусут, лæппутæ!..' > /tmp/ptest/test.txt
echo 'Байхъусут, зæрæдтæ!.. Байхъусут, лæппутæ!..
Байхъусут зарæгмæ, фыдæлты кадæгмæ,
Дзæбæхдæр бахъырнут уæ бæзджын хъæлæстæй!..' > /tmp/ptest/Байхъусут.txt

cat > tscript.py <<"EOF"
# -*- coding: utf-8 -*-
import fileinput, sys, string, os

if ( len(sys.argv) > 3 ) or ( len(sys.argv) < 2 ):
  print "Usage: ", sys.argv[0], "searchterm [path]"
  sys.exit()

# take the first argument out of sys.argv and assign it to searchterm
searchterm, sys.argv[1:] = sys.argv[1], sys.argv[2:]

if len(sys.argv) == 1:                  # if no dir is specified,
  indir = os.curdir                     #   use current dir
else:                                   # otherwise, use dir specified
  indir = sys.argv[1]                   #   on the command line

filenames = [indir+"/"+f for f in os.listdir(indir) if os.path.isfile(indir+"/"+f)]

for line in fileinput.input(filenames):
  num_matches = string.count(line, searchterm)
  if num_matches:                     # a nonzero count means there was a match
    print "found '%s' %d times in %s on line " % ( searchterm, num_matches, fileinput.filename() ), \
      fileinput.filelineno()
EOF

Trying this:

$ python2.7 tscript.py Байхъусут /tmp/ptest
found 'Байхъусут' 2 times in /tmp/ptest/test.txt on line  1
found 'Байхъусут' 2 times in /tmp/ptest/Байхъусут.txt on line  1
found 'Байхъусут' 1 times in /tmp/ptest/Байхъусут.txt on line  2

$ python3.2 tscript.py Байхъусут /tmp/ptest
  File "tscript.py", line 17
    print "Usage: ", sys.argv[0], "searchterm [path]"
                  ^
SyntaxError: invalid syntax

Ok, that must be the change of print - will just adding parenthesis do? I change like this:

  print ("Usage: ", sys.argv[0], "searchterm [path]")
  ....
    print ("found '%s' %d times in %s on line " % ( searchterm, num_matches, fileinput.filename() ), \
      fileinput.filelineno() )

... will that do?:

$ python3.2 tscript.py Байхъусут /tmp/ptest
Traceback (most recent call last):
  File "tscript.py", line 31, in <module>
    num_matches = string.count(line, searchterm)
AttributeError: 'module' object has no attribute 'count'

Nope.. so I also change this line:

  num_matches = line.count(searchterm) # string.count(line, searchterm)

... is that enough? Well - somewhat, it seems:

$ python3.2 tscript.py Байхъусут /tmp/ptest
found 'Байхъусут' 2 times in /tmp/ptest/test.txt on line  1
found 'Байхъусут' 2 times in /tmp/ptest/Байхъусут.txt on line  1
found 'Байхъусут' 1 times in /tmp/ptest/Байхъусут.txt on line  2
$ python2.7 tscript.py Байхъусут /tmp/ptest
("found '\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82' 2 times in /tmp/ptest/test.txt on line ", 1)
("found '\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82' 2 times in /tmp/ptest/\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82.txt on line ", 1)
("found '\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82' 1 times in /tmp/ptest/\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82.txt on line ", 2)

Now at least it doesn't crash - but the python 2.7 print sees a tuple, and apparently it doesn't by default decode the string inside that tuple right ...

So, apparently, now I want to import print_function from __future__ for python 2.7 (Which python version needs from __future__ import with_statement?); so I try to put this at the top of the file (after the coding statement), thinking that I better try to use the import only for 2.x version:

import __future__, sys
if sys.version_info[0] < 3:
  from __future__ import print_function
else:
  pass

... but I get:

$ python2.7 tscript.py Байхъусут /tmp/ptest
  File "tscript.py", line 6
    from __future__ import print_function
SyntaxError: from __future__ imports must occur at the beginning of the file

The answer to this, in the question Python graceful future feature (__future__) import is to use a wrapper .py file - but then, I have the same problem again of having to think of two files, instead of one.

I thought I could cheat like this - even if it does create an extra file:

import __future__, sys
if sys.version_info[0] < 3:
  str = """from __future__ import print_function"""
  f = open('compat23.py','w')
  f.write(str)
  f.close()
  import compat23
  print("sys.version_info[0] < 3", end='(')
else:
  print("sys.version_info[0] >= 3", end=')')

... but that doesn't matter really:

$ python2.7 tscript.py Байхъусут /tmp/ptest
  File "tscript.py", line 11
    print("sys.version_info[0] < 3", end='(')
                                        ^
SyntaxError: invalid syntax

... because the __future__ import was valid only for the scope of newly-created compat23 module, apparently.

So:

I am apparently making a mistake trying to limit __future__ import only to versions below 3, given that from __future__ ... is a compile-time statement; but then:
How does Python 3 react to this statement? Does it simply get ignored?
What happens then, when in Python 4 they decide to deprecate print again - wouldn't then from __future__ import print_function have a meaning again in Python 3, even if it may be ignored in Python 3 currently?

So, I guess, if I want to avoid thinking about this, and still use a single-file only script, I'm down to the advice in noconv.html: "... or you can use a separate print function that works under both Python 2 and Python 3 .. the trick is to use sys.stdout.write() and formatting ...."; also seen in Eli Bendersky's website » Making code compatible with Python 2 and 3.

And so I try with this at start of the file, instead of the __future__ import part - and change the corresponding print statements:

def printso(*inargs):
  outstr = ""
  for inarg in inargs:
    outstr += str(inarg) + " "
  outstr += "\n"
  sys.stdout.write(outstr)

.... printso ("Usage: ", sys.argv[0], "searchterm [path]") .... printso ("found '%s' %d times in %s on line " % ( searchterm, num_matches, fileinput.filename() ), \ fileinput.filelineno() )

... and this does, indeed, work fine in both python 2.7 and 3.2:

$ python2.7 tscript.py Байхъусут /tmp/ptest
found 'Байхъусут' 2 times in /tmp/ptest/test.txt on line  1
found 'Байхъусут' 2 times in /tmp/ptest/Байхъусут.txt on line  1
found 'Байхъусут' 1 times in /tmp/ptest/Байхъусут.txt on line  2
$ python3.2 tscript.py Байхъусут /tmp/ptest
found 'Байхъусут' 2 times in /tmp/ptest/test.txt on line  1
found 'Байхъусут' 2 times in /tmp/ptest/Байхъусут.txt on line  1
found 'Байхъусут' 1 times in /tmp/ptest/Байхъусут.txt on line  2

OK, but now it turns out that percent sign % for string formatting is deprecated as well; so instead I should write:

  #printso ("found '%s' %d times in %s on line " % ( searchterm, num_matches, fileinput.filename() ), \
  #  fileinput.filelineno() )
  printso ("found '{0}' {1} times in {2} on line ".format(searchterm, num_matches, fileinput.filename() ), \
    fileinput.filelineno() )

Thankfully, this works for both 2.7 and 3.2, and in New Python 3.0 string formatting - really necessary? - comp.lang.python | Google Groups it is stated:

>> You can use the old 2.x syntax also in Python 3.x:

> Yeah, but it's deprecated, and - as I understand it - may be removed
> completely in future versions. Also, in the future, if you are working
> with code from another developer, it's likely that developer will use
> the new format. I suppose you can use both - but what an awful mess
> that would be.

It's not going to be removed for many years - if ever.

... however, who can be sure for how long this will stay true, given it's deprecated?

So, essentially - I would like to confirm:

How does from __future__ import behave in Python 3? What when Python 4 comes about, and the Python 3 at that time contains deprecated features, which will have to be imported from "future" Python 4?
for a script of this character, which I want to keep in single .py file, and compatible for both Python 2.7 and (hopefully) 3+: am I better off writing my own print function based on sys.stdout.write, and using that everywhere, instead of messing with __future__?
Am I also better off using the new string formatting syntax everywhere?

score 2 · Answer 1 · answered Apr 27 '13 at 01:31

Python's __from__ future import feature statement is forwards compatible. That is, even if feature becomes standard in a future release, the import statement is still legal.

So rather than doing a bunch of work to get your own print function to work, just unconditionally put this at the top of your file (before any other code):

from __future__ import print_function

It will just work, forever.

The right strategy for printing formatted strings from a single Python script, in both Python 2 and 3?

1 Answers1