1

I'd like to read the contents from several files into unique lists that I can call later - ultimately, I want to convert these lists to sets and perform intersections and subtraction on them. This must be an incredibly naive question, but after poring over the iterators and loops sections of Lutz's "Learning Python," I can't seem to wrap my head around how to approach this. Here's what I've written:

#!/usr/bin/env python

import sys

OutFileName = 'test.txt'
OutFile = open(OutFileName, 'w')

FileList = sys.argv[1: ]
Len = len(FileList)
print Len

for i in range(Len):
    sys.stderr.write("Processing file %s\n" % (i))
    FileNum = i
    
for InFileName in FileList:
    InFile = open(InFileName, 'r')
    PathwayList = InFile.readlines()
    print PathwayList
    InFile.close()

With a couple of simple test files, I get output like this:

Processing file 0

Processing file 1

['alg1\n', 'alg2\n', 'alg3\n', 'alg4\n', 'alg5\n', 'alg6']

['csr1\n', 'csr2\n', 'csr3\n', 'csr4\n', 'csr5\n', 'csr6\n', 'csr7\n', 'alg2\n', 'alg6']

These lists are correct, but how do I assign each one to a unique variable so that I can call them later (for example, by including the index # from range in the variable name)?

Thanks so much for pointing a complete programming beginner in the right direction!

Community
  • 1
  • 1
pandaSeq
  • 213
  • 2
  • 6
  • 12

6 Answers6

2
#!/usr/bin/env python

import sys

FileList = sys.argv[1: ]
PathwayList = []
for InFileName in FileList:
    sys.stderr.write("Processing file %s\n" % (i))
    InFile = open(InFileName, 'r')
    PathwayList.append(InFile.readlines())
    InFile.close()

Assuming you read in two files, the following will do a line by line comparison (it won't pick up any extra lines in the longer file, but then they'd not be the same if one had more lines than the other ;)

for i, s in enumerate(zip(PathwayList[0], PathwayList[1]), 1):
    if s[0] == s[1]:
        print i, 'match', s[0]
    else:
        print i, 'non-match', s[0], '!=', s[1]

For what you're wanting to do, you might want to take a look at the difflib module in Python. For sorting, look at Mutable Sequence Types, someListVar.sort() will sort the contents of someListVar in place.

John Gaines Jr.
  • 11,174
  • 1
  • 25
  • 25
  • I'm stuck on how I save the PathwayList from each file as a unique variable (so that I can compare the list from file 1 to the list from file 2). The answer that was up about dynamic assignment of variables seems to point in the right direction - I'm looking into that at the moment. – pandaSeq Sep 13 '11 at 20:50
  • PathwayList.append = ... will not achieve what you want. – rocksportrocker Sep 13 '11 at 21:28
  • I see the idea, but my lists will be of different lengths, and not necessarily sorted. I can envision cases where this will be what I want, though, so thanks! – pandaSeq Sep 13 '11 at 21:31
1

You could do it like that if you don't need to remeber where the contents come from :

PathwayList = []
for InFileName in FileList:
    sys.stderr.write("Processing file %s\n" % InFileName)
    InFile = open(InFileName, 'r')
    PathwayList.append(InFile.readlines())
    InFile.close()  

for contents in PathwayList:
    # do something with contents which is a list of strings
    print contents  

or, if you want to keep track of the files names, you could use a dictionary :

PathwayList = {}
for InFileName in FileList:
    sys.stderr.write("Processing file %s\n" % InFileName)
    InFile = open(InFileName, 'r')
    PathwayList[InFile] = InFile.readlines()
    InFile.close()

for filename, contents in PathwayList.items():
    # do something with contents which is a list of strings
    print filename, contents  
dugres
  • 12,613
  • 8
  • 46
  • 51
1

You might want to check out Python's fileinput module, which is a part of the standard library and allows you to process multiple files at once.

jathanism
  • 33,067
  • 9
  • 68
  • 86
1

Essentially, you have a list of files and you want to change to list of lines of these files...

Several ways:

result = [ list(open(n)) for n in sys.argv[1:] ]

This would get you a result like -> [ ['alg1', 'alg2', 'alg3'], ['csr1', 'csr2'...]] Accessing would be like 'result[0]' which would result in ['alg1', 'alg2', 'alg3']...

Somewhat better might be dictionary:

result = dict( (n, list(open(n))) for n in sys.argv[1:] )

If you want to just concatenate, you would just need to chain it:

import itertools
result = list(itertools.chain.from_iterable(open(n) for n in sys.argv[1:]))
# -> ['alg1', 'alg2', 'alg3', 'csr1', 'csr2'...

Not one-liners for a beginner...however now it would be a good exercies to try to comprehend what's going on :)

ondra
  • 9,122
  • 1
  • 25
  • 34
0

You need to dynamically create the variable name for each file 'number' that you're reading. (I'm being deliberately vague on purpose, knowing how to build variables like this is quite valuable and more readily remembered if you discover it yourself)

something like this will give you a start

Community
  • 1
  • 1
KevinDTimm
  • 14,226
  • 3
  • 42
  • 60
0

You need a list which holds your PathwayList lists, that is a list of lists.

One remark: it is quite uncommon to use capitalized variable names. There is no strict rule for that, but by convention most people only use capitalized names for classes.

rocksportrocker
  • 7,251
  • 2
  • 31
  • 48
  • OK, the list of lists seems intuitive to me - I can just call each list by its index. It seems that there is probably also a solution involving a dictionary - am I thinking along the right lines? Thanks for the tip about capitalization - I learned this from the book "Practical Computing for Biologists," which I loved for its assumption of no prior knowledge but am learning doesn't necessarily teach the "best" ways of doing things. I don't even understand what classes are yet, but I'll start using lowercase before it becomes a bad habit :) – pandaSeq Sep 13 '11 at 21:00
  • Look at @durges solution. He shows how to use a list or how to use a dictionary. – rocksportrocker Sep 13 '11 at 21:29