My suggestion is to use a dictionary keyed on the 5 digit line type code. Each value in the dictionary can be a list of field offsets (or of (offset, width) tuples), indexed by field position.
If your fields have names it may be convenient to use a class instead of a list to store field offset data. However, namedtuples
may be better here, since then you can access your field offset data either via its name or by its field position, so you get the best of both worlds.
namedtuple
s are actually implemented as classes, but defining a new namedtuple
type is much more compact that creating an explicit class definition, and namedtuples
use the __slots__
protocol, so they take up less RAM than a normal class that uses __dict__
for storing its attributes.
Here's one way to use namedtuples
to store field offset data. I'm not claiming that the following code is the best way to do this, but it should give you some ideas.
from collections import namedtuple
#Create a namedtuple, `Fields`, containing all field names
fieldnames = [
'record_type',
'special',
'communication',
'id_number',
'transaction_code',
'amount',
'other',
]
Fields = namedtuple('Fields', fieldnames)
#Some fake test data
data = [
# 1 2 3 4 5
#012345678901234567890123456789012345678901234567890123
"12455WE READ THIS TOO796445 125997 554777",
"22455 888AND THIS TOO796445 125997 55477778 2 1",
]
#A dict to store the field (offset, width) data for each field in a record,
#keyed by record type, which is always stored at (0, 5)
offsets = {}
#Some fake record structures
offsets['12455'] = Fields(
record_type=(0, 5),
special=None,
communication=(5, 28),
id_number=(33, 6),
transaction_code=(40, 6),
amount=(48, 6),
other=None)
offsets['22455'] = Fields(
record_type=(0, 5),
special=(6, 3),
communication=(9, 18),
id_number=(27, 6),
transaction_code=(34, 6),
amount=(42, 8),
other=(51,3))
#Test.
for row in data:
print row
#Get record type
rt = row[:5]
#Get field structure
fields = offsets[rt]
for name in fieldnames:
#Get field offset data by field name
t = getattr(fields, name)
if t is not None:
start, flen = t
stop = start + flen
data = row[start : stop]
print "%-16s ... %r" % (name, data)
print
output
12455WE READ THIS TOO796445 125997 554777
record_type ... '12455'
communication ... 'WE READ THIS TOO'
id_number ... '796445'
transaction_code ... '125997'
amount ... '554777'
22455 888AND THIS TOO796445 125997 55477778 2 1
record_type ... '22455'
special ... '888'
communication ... 'AND THIS TOO'
id_number ... '796445'
transaction_code ... '125997'
amount ... '55477778'
other ... '2 1'