0

If I have a large text file, and I've isolated a specific line from the text, for instance:

ANBL  1      2    345678    0.9   01.2  34                56.     7.8

Such that:

Column 1-4 = "ANBL"

Column 6-7 = " 1"

Column 10-11 = " "

Column 13-14 = " 2",

etc.,

is there an efficient way to read information from column numbers: 6-7, 10-11, and 13-14? In decoding the text file, some values may be present, while other values are not. However, I know the specific column numbers of where information would show up, if it were present. A similar question is asked here, but the accepted answer would not work in this situation since .split() on a string would skip over values that may contain information.

To specify, the columns are all separated by spaces, and the numbers are not the same in each line. An example of this would be comparing the following lines:

F014785236969    2          5  4  7.00 41.00   9    3.11         5.4         1.1
AJ51648705469    3        003002  1.60 13.00  17     7.0   6.0   5.4 20.00   2.2
AaronJPung
  • 1,105
  • 1
  • 19
  • 35

1 Answers1

0

If the data in your file is structured in a way so that your "columns" are always in the same character position, you can just use slices to get what you need:

line = "ANBL  1      2    345678    0.9   01.2  34                56.     7.8"
print(line[0:4]) # >> ANBL

If you know where each col starts and ends, you can define that ahead of time and do something like this (using the data from your second example as the contents of test.txt):

# Location of columns
cols = [(0,13), (17,18), (26,29), (29,32), (33, 38), (39, 44)]


with open('test.txt', 'r') as f:
    for line in f.readlines():
        data = [line[c[0]:c[1]] for c in cols]
        print(data)

Prints:

['F014785236969', '2', '  5', '  4', ' 7.00', '41.00']
['AJ51648705469', '3', '003', '002', ' 1.60', '13.00']