There seem to be two interleaved issues here and I address that first. For how to make both Perl and Python use either invocation with a very similar behavior see the second part of the post.
Short: They differ in how they do I/O but both work line-by-line, and Python code is easily changed to allow the same command-line invocation as Perl code. Also, both can be written so to allow input either from file or from standard input stream.
(1) Both of your solutions are "streaming," in the sense that they both process input line-by-line. Perl code reads from STDIN
while Python code gets data from a file, but they both get a line at a time. In that sense they are comparable in efficiency for large files.
A standard way to both read and write files line-by-line in Python is
with open('infile', 'r') as fin, open('outfile', 'w') as fout:
fout.write(fin.read().lower())
See, for example, these SO posts on processing a very large file and read-and-write files. The way your read the file seems idiomatic for line-by-line processing, see for example SO posts on reading large-file line-by-line, on idiomatic line-by-line reading and another one on line-by-line reading.
Change the first open here to your io.open
to directly take the first argument from the command line as the file name, and add modes as needed.
(2) The command line with both input and output redirection that you show is a shell feature
./program < input > output
The program
is fed lines through the standard input stream (file descriptor 0). They are provided from the file input
by the shell via its <
redirection. From gnu bash manual (see 3.6.1), where "word" stands for our "input"
Redirection of input causes the file whose name results from the expansion of word to be opened for reading on file descriptor n, or the standard input (file descriptor 0) if n is not specified.
Any program can be written to do that, ie. act as a filter. For Python you can use
import sys
for line in sys.stdin:
print line.lower()
See for example a post on writing filters. Now you can invoke it as script.py < input
in a shell.
The code print
s to standard output, which can then be redirected by shell using >
. Then you get the same invocation as for the Perl script.
I take it that the standard output redirection >
is clear in both cases.
Finally, you can bring both to a nearly identical behavior, and allowing either invocation, in this way.
In Perl, there is the following idiom
while (my $line = <>) {
# process $line
}
The diamond operator <>
either takes line by line from all files submitted on the command line (which are found in @ARGV
), or it gets its lines from STDIN
(if data is somehow piped into the script). From I/O Operators in perlop
The null filehandle <>
is special: it can be used to emulate the behavior of sed and awk, and any other Unix filter program that takes a list of filenames, doing the same to each line of input from all of them. Input from <>
comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <>
is evaluated, the @ARGV
array is checked, and if it is empty, $ARGV[0]
is set to "-"
, which when opened gives you standard input. The @ARGV
array is then processed as a list of filenames.
In Python you get practically the same behavior by
import fileinput
for line in fileinput.input():
# process line
This also goes through lines of files named in sys.argv
, defaulting to sys.stdin
if list is empty. From fileinput documentation
This iterates over the lines of all files listed in sys.argv[1:]
, defaulting to sys.stdin
if the list is empty. If a filename is '-'
, it is also replaced by sys.stdin
. To specify an alternative list of filenames, pass it as the first argument to input()
. A single file name is also allowed.
In both cases, if there are command-line arguments other than file names more need be done.
With this you can use both Perl and Python scripts in either way
lowercase < input > output
lowercase input > output
Or, for that matter, as cat input | lowercase > output
.
All methods here read input and write output line-by-line. This may be further optimized (buffered) by the interpreter, the system, and shell's redirections. It is possible to change that so to read and/or write in smaller chunks but that would be extremely inefficient and noticeably slow down programs.