1

Possible Duplicate:
Efficient way of parsing fixed width files in Python

Not even sure if "space delimited" is really the right term here (which is probably what is hindering my search efforts). Basically, field X begins at character 0 and field Y begins at character 30. Pretty sure this is an ancient file format that one of our systems still uses. I could roll my own solution easily, but I would rather use an existing library if one exists.

Community
  • 1
  • 1
Scott
  • 293
  • 5
  • 12
  • You're right, "space delimited" probably isn't the best term, because it could also mean a CSV dialect that uses spaces instead of commas… but your explanation makes it perfectly clear what you actually want, so I wouldn't worry about it too much. – abarnert Dec 08 '12 at 02:16

3 Answers3

3

This question looks pretty similar to yours. It looks like they had some suggestions of which modules would be most useful:

How to efficiently parse fixed width files?

Community
  • 1
  • 1
Rachel Sanders
  • 5,734
  • 1
  • 27
  • 36
2
with open('myfile.txt') as f:
  for line in f.readlines():
    x, y = line[:28], line[29:]

Should seperate the x, y arguments from each line.

Aesthete
  • 18,622
  • 6
  • 36
  • 45
  • That is really the best approach for a fixed width file, just split the strings at the appropriate places. – TimothyAWiseman Dec 08 '12 at 01:14
  • 1
    The `line.split()` here is going to break things. For example, with `'1234567890123456789012345 abcdef'`, it will set `line = [''1234567890123456789012345', 'abcdef']`, and then `line[:28]` will raise an index error. If you take the `split` line out, then this is the right answer. – abarnert Dec 08 '12 at 02:18
  • @abarnert - Thanks, I read the question differently, meaning there are n amount of values, space delimited, and the first 30 are x, the rest are y. – Aesthete Dec 08 '12 at 04:17
  • +1 now that's fixed. But a few last comments: Many fixed-width formats allow the entire column to be used with no space at all, so `12345678901234567890123456abcdef` where the fields are `12345678901234567890123456` and `abcdef`. It doesn't sound like that's true in the OP's case, but in general, you probably want `line[:29], line[29:]` rather than `line[:28], line[29:]`. Also, you probably want `line[:29].rstrip()` to get rid of the excess spaces. And finally, why are you starting field `Y` at column 29 instead of the OP's 30? – abarnert Dec 10 '12 at 00:47
-1

Subclass csv.Dialect as follows:

import csv

class SpaceCsv(csv.Dialect):
    "csv format for exporting tables"
    delimiter = None
    doublequote = True
    escapechar = None
    lineterminator = '\n'
    quotechar = '"'
    skipinitialspace = True
    quoting = csv.QUOTE_MINIMAL
csv.register_dialect('space', SpaceCsv)

Then use this as csv.reader(filename, dialect="space"). Let me know how you get on...

hd1
  • 33,938
  • 5
  • 80
  • 91
  • This will raise a `TypeError: delimiter must be set`. And, even if it worked, it's not going to split on fixed-width columns, which is what the OP asked for. – abarnert Dec 08 '12 at 02:20