python read -> analyze -> print multiple files

Question

I have 6 files of similar format but different name. (for example, file_AA.dat file_AB.dat file_AC.dat file_BA.dat file_BB.dat file_BC.dat)

Can I write a for-loop script to read, analyze, and print out those files at once, instead of operating script 6 times? Such as,

for i in {AA AB AC BA BB BC} 
 filename = 'file_$i.dat'
 file = open (filename, 'r')
 Do a lot, lot of analysis for lots of rows and columns :P 
 file open('output_file_$i.dat','w')
 Do some for loop for writing and calculation 
file.close

So, I hope to automate the process of reading / analyzing / writing the different files (but similar format) at once. I'm curious about how could I handle its naming of input/output part. This way, I wish I could analyze numerous number of files more quickly and easily.

Or, is there any way to do the same using the mix of python and Cshell or shell script?

Thank you

score 3 · Accepted Answer · edited May 23 '17 at 10:26

The idea is to iterate over file names, open each file in a loop, do the analysis, then write the output file:

filenames = ['file_AA.dat', 'file_AB.dat', 'file_AC.dat', 'file_BA.dat', 'file_BB.dat', 'file_BC.dat']

for filename in filenames:
    with open(filename, 'r') as input_file:
        # Do a lot, lot of analysis for lots of rows and columns :P

    with open('output_%s' % filename, 'w') as output_file:
        # Do some for loop for writing and calculation

Note that using with statement is recommended when working with files.

Also note that you can combine the two with statements into one, see:

Multiple variables in Python 'with' statement

UPD: you can use string formatting for constructing the list of filenames:

>>> patterns = ['AA', 'AB', 'AC', 'BA', 'BB', 'BC']
>>> filenames = ['file_{}.dat'.format(pattern) for pattern in patterns]
>>> filenames
['file_AA.dat', 'file_AB.dat', 'file_AC.dat', 'file_BA.dat', 'file_BB.dat', 'file_BC.dat']

Hope that helps.

Broseph · Answer 2 · 2014-03-23T18:27:39.373

files = [
    "file_AA.dat",
    "file_AB.dat",
    "file_AC.dat",
    "file_BA.dat",
    "file_BB.dat",
    "file_BC.dat",
]
for filename in files:
    f = open(filename)
    data = f.read() #reads all data from file into a string
    #parse data here and do other stuff
    output = open("output_"+filename, 'w')
    output.write(junk) #junk is a string that you shove the results into
    output.close()

If you have Tons of files and you are doing computation heavy analysis of the data in the files, you Can use the multiprocessing module. As for bash vs python, I basically use a python interpreter the same way lots of people use a bash shell and I almost never have a reason to leave a python interpreter. Also, if these files are the only files in a directory, you can use the os module to walk the directory. If you must run a program in a bash shell, you can use the subprocess module.

Gort the Robot · Answer 3 · 2014-03-23T18:31:28.837

You can use a list comprehension to do this cleanly:

for filein, fileout in [('file_%s.dat' % x, 'out_%s.dat' %x) for x in ('AA','AB','AC', 'BA', 'BB', 'BC')]:
    with open(filein, 'rb') as fp, open(fileout,'w') as fpout:
        # Read from fp, write to fpout as needed

This list comprehension creates the list of input/output file pairs:

[('file_%s.dat' % x, 'out_%s.dat' %x) for x in ('AA','AB','AC', 'BA', 'BB', 'BC')]

This generates a list that looks like:

[('file_AA.dat', 'out_AA.dat'), ('file_AB.dat', 'out_AB.dat') ...]

You might try testing how this works like this:

lst = [('file_%s.dat' % x, 'out_%s.dat' %x) for x in ('AA','AB','AC', 'BA', 'BB', 'BC')]:
print lst

for filein, fileout in lst:
    with open(filein, 'rb') as fp, open(fileout,'w') as fpout:
        # Read from fp, write to fpout as needed

Great tactics of analyzing multiple works at once. I will try this too. Thank you very much — exsonic01, Mar 23 '14 at 18:58

python read -> analyze -> print multiple files

3 Answers3