I know I'm quite late with all this, but I have a relatively simple command-line Python script, written for 2.7, which I'd like to make usable on both Python 2.7+ and Python 3+. Since it's a single script:
- I do not want to use six - while
six
is just a single file, now I'd have to take care of two files (thesix
module and my script), instead of one - I do not want to use 2to3; because then again I'd have to take care of two files (the 2.7 version of my script and the 3.2 version of it), instead of one
So, I thought the best approach for me would be to write Python 2.x as much compatible with Python 3.x as possible; then I could code once, and not worry if I have to run the script on a USB-thumbdrive OS, which may only have Python 2.7 (or for that matter, only Python 3+), and which I may have trouble finding and/or installing the right version of Python for.
To demonstrate my problems, here is a sample script based on examples in Learning Python -- Sample chapter 9: Common Tasks in Python - and the preparation in bash
on Ubuntu 11.04 (with a bit of Unicode, to spice it up):
cd /tmp
mkdir /tmp/ptest
echo 'Байхъусут, зæрæдтæ!.. Байхъусут, лæппутæ!..' > /tmp/ptest/test.txt
echo 'Байхъусут, зæрæдтæ!.. Байхъусут, лæппутæ!..
Байхъусут зарæгмæ, фыдæлты кадæгмæ,
Дзæбæхдæр бахъырнут уæ бæзджын хъæлæстæй!..' > /tmp/ptest/Байхъусут.txt
cat > tscript.py <<"EOF"
# -*- coding: utf-8 -*-
import fileinput, sys, string, os
if ( len(sys.argv) > 3 ) or ( len(sys.argv) < 2 ):
print "Usage: ", sys.argv[0], "searchterm [path]"
sys.exit()
# take the first argument out of sys.argv and assign it to searchterm
searchterm, sys.argv[1:] = sys.argv[1], sys.argv[2:]
if len(sys.argv) == 1: # if no dir is specified,
indir = os.curdir # use current dir
else: # otherwise, use dir specified
indir = sys.argv[1] # on the command line
filenames = [indir+"/"+f for f in os.listdir(indir) if os.path.isfile(indir+"/"+f)]
for line in fileinput.input(filenames):
num_matches = string.count(line, searchterm)
if num_matches: # a nonzero count means there was a match
print "found '%s' %d times in %s on line " % ( searchterm, num_matches, fileinput.filename() ), \
fileinput.filelineno()
EOF
Trying this:
$ python2.7 tscript.py Байхъусут /tmp/ptest
found 'Байхъусут' 2 times in /tmp/ptest/test.txt on line 1
found 'Байхъусут' 2 times in /tmp/ptest/Байхъусут.txt on line 1
found 'Байхъусут' 1 times in /tmp/ptest/Байхъусут.txt on line 2
$ python3.2 tscript.py Байхъусут /tmp/ptest
File "tscript.py", line 17
print "Usage: ", sys.argv[0], "searchterm [path]"
^
SyntaxError: invalid syntax
Ok, that must be the change of print - will just adding parenthesis do? I change like this:
print ("Usage: ", sys.argv[0], "searchterm [path]")
....
print ("found '%s' %d times in %s on line " % ( searchterm, num_matches, fileinput.filename() ), \
fileinput.filelineno() )
... will that do?:
$ python3.2 tscript.py Байхъусут /tmp/ptest
Traceback (most recent call last):
File "tscript.py", line 31, in <module>
num_matches = string.count(line, searchterm)
AttributeError: 'module' object has no attribute 'count'
Nope.. so I also change this line:
num_matches = line.count(searchterm) # string.count(line, searchterm)
... is that enough? Well - somewhat, it seems:
$ python3.2 tscript.py Байхъусут /tmp/ptest
found 'Байхъусут' 2 times in /tmp/ptest/test.txt on line 1
found 'Байхъусут' 2 times in /tmp/ptest/Байхъусут.txt on line 1
found 'Байхъусут' 1 times in /tmp/ptest/Байхъусут.txt on line 2
$ python2.7 tscript.py Байхъусут /tmp/ptest
("found '\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82' 2 times in /tmp/ptest/test.txt on line ", 1)
("found '\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82' 2 times in /tmp/ptest/\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82.txt on line ", 1)
("found '\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82' 1 times in /tmp/ptest/\xd0\x91\xd0\xb0\xd0\xb9\xd1\x85\xd1\x8a\xd1\x83\xd1\x81\xd1\x83\xd1\x82.txt on line ", 2)
Now at least it doesn't crash - but the python 2.7 print
sees a tuple, and apparently it doesn't by default decode the string inside that tuple right ...
So, apparently, now I want to import print_function
from __future__
for python 2.7 (Which python version needs from __future__ import with_statement?); so I try to put this at the top of the file (after the coding
statement), thinking that I better try to use the import only for 2.x version:
import __future__, sys
if sys.version_info[0] < 3:
from __future__ import print_function
else:
pass
... but I get:
$ python2.7 tscript.py Байхъусут /tmp/ptest
File "tscript.py", line 6
from __future__ import print_function
SyntaxError: from __future__ imports must occur at the beginning of the file
The answer to this, in the question Python graceful future feature (__future__) import is to use a wrapper .py
file - but then, I have the same problem again of having to think of two files, instead of one.
I thought I could cheat like this - even if it does create an extra file:
import __future__, sys
if sys.version_info[0] < 3:
str = """from __future__ import print_function"""
f = open('compat23.py','w')
f.write(str)
f.close()
import compat23
print("sys.version_info[0] < 3", end='(')
else:
print("sys.version_info[0] >= 3", end=')')
... but that doesn't matter really:
$ python2.7 tscript.py Байхъусут /tmp/ptest
File "tscript.py", line 11
print("sys.version_info[0] < 3", end='(')
^
SyntaxError: invalid syntax
... because the __future__
import was valid only for the scope of newly-created compat23
module, apparently.
So:
- I am apparently making a mistake trying to limit
__future__
import only to versions below 3, given thatfrom __future__ ...
is a compile-time statement; but then: - How does Python 3 react to this statement? Does it simply get ignored?
- What happens then, when in Python 4 they decide to deprecate
print
again - wouldn't thenfrom __future__ import print_function
have a meaning again in Python 3, even if it may be ignored in Python 3 currently?
So, I guess, if I want to avoid thinking about this, and still use a single-file only script, I'm down to the advice in noconv.html: "... or you can use a separate print function that works under both Python 2 and Python 3 .. the trick is to use sys.stdout.write() and formatting ...."; also seen in Eli Bendersky's website » Making code compatible with Python 2 and 3.
And so I try with this at start of the file, instead of the __future__
import part - and change the corresponding print statements:
def printso(*inargs):
outstr = ""
for inarg in inargs:
outstr += str(inarg) + " "
outstr += "\n"
sys.stdout.write(outstr)
.... printso ("Usage: ", sys.argv[0], "searchterm [path]") .... printso ("found '%s' %d times in %s on line " % ( searchterm, num_matches, fileinput.filename() ), \ fileinput.filelineno() )
... and this does, indeed, work fine in both python 2.7 and 3.2:
$ python2.7 tscript.py Байхъусут /tmp/ptest
found 'Байхъусут' 2 times in /tmp/ptest/test.txt on line 1
found 'Байхъусут' 2 times in /tmp/ptest/Байхъусут.txt on line 1
found 'Байхъусут' 1 times in /tmp/ptest/Байхъусут.txt on line 2
$ python3.2 tscript.py Байхъусут /tmp/ptest
found 'Байхъусут' 2 times in /tmp/ptest/test.txt on line 1
found 'Байхъусут' 2 times in /tmp/ptest/Байхъусут.txt on line 1
found 'Байхъусут' 1 times in /tmp/ptest/Байхъусут.txt on line 2
OK, but now it turns out that percent sign %
for string formatting is deprecated as well; so instead I should write:
#printso ("found '%s' %d times in %s on line " % ( searchterm, num_matches, fileinput.filename() ), \
# fileinput.filelineno() )
printso ("found '{0}' {1} times in {2} on line ".format(searchterm, num_matches, fileinput.filename() ), \
fileinput.filelineno() )
Thankfully, this works for both 2.7 and 3.2, and in New Python 3.0 string formatting - really necessary? - comp.lang.python | Google Groups it is stated:
>> You can use the old 2.x syntax also in Python 3.x:
> Yeah, but it's deprecated, and - as I understand it - may be removed
> completely in future versions. Also, in the future, if you are working
> with code from another developer, it's likely that developer will use
> the new format. I suppose you can use both - but what an awful mess
> that would be.It's not going to be removed for many years - if ever.
... however, who can be sure for how long this will stay true, given it's deprecated?
So, essentially - I would like to confirm:
- How does
from __future__ import
behave in Python 3? What when Python 4 comes about, and the Python 3 at that time contains deprecated features, which will have to be imported from "future" Python 4? - for a script of this character, which I want to keep in single
.py
file, and compatible for both Python 2.7 and (hopefully) 3+: am I better off writing my ownprint
function based onsys.stdout.write
, and using that everywhere, instead of messing with__future__
? - Am I also better off using the new string formatting syntax everywhere?