4

I am new to python, and although I am sure this might be a trivial question, I have spent my day trying to solve this in different ways. I have a file containing data that looks like this:

<string>
<integer>
<N1>
<N2>
data
data
...
<string>
<integer>
<N3>
<N4>
data
data
...

And that extends a number of times... I need to read the "data" which for the first set (between the first and second ) contains a number N1 of X points, a number N2 of Y points and a number N1*N2 of Z points. If I had only one set of data I already know how to read all the data, then read the value N1, N2, then slice it into X, Y and Z, reshape it and use it... but if my file contains more than one sets of data, how do I read only from one string until the next one, and then repeat the same operation for the next set, and again until I reach the end of the file? I tried defining a function like:

def dat_fun():
    with open("inpfile.txt", "r") as ifile:
        for line in ifile:
            if isinstance('line', str) or (not line):
                break
            for line in ifile:
                yield line

but is not working, I get arrays with no data on them. Any comments will be appreciated. Thanks!

jealopez
  • 111
  • 2
  • 3
  • 10
  • Is this an xml file, if it is you could use python's built in xml parsing module. – John Jul 02 '13 at 22:14
  • Its a plain text file. @johnthexiii. I want all the sets, some files contain two sets, some more. From each set of data I have to use the X, Y and Z to create some plots, which I've been successful in doing if I manually create independent files with only one set of " \n \n \n data ...." per file. I want to be able to read one set until the next is reached (and use its data), then read trough the next set of data until the next is reached, and so on until the end of the file. Thanks! – jealopez Jul 02 '13 at 22:38

3 Answers3

7

All lines are instances of str, so you break out on the first line. Remove that test, and test for an empty line by stripping away whitespace first:

def dat_fun():
    with open("inpfile.txt", "r") as ifile:
        for line in ifile:
            if not line.strip():
                break
            yield line

I don't think you need to break at an empty line, really; the for loop ends on its own at the end of the file.

If your lines contain other sorts of data, you'd need to do the conversion yourself, coming from string.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • So the fist test to see if its a string doesn't differentiate the floats (which are the "data" type) or the integers (which are the N1, N2, N3...) from the 'alphabetical character' strings? Because my data are numbers, and I want to get arrays of data, lets say data1 for the data between the first and second , data2 for the data between the second and third and so on until the end of the file. Thanks! – jealopez Jul 02 '13 at 22:43
3

With structured data like this, I'd suggest just reading what you need. For example:

with open("inpfile.txt", "r") as ifile:
    first_string = ifile.readline().strip() # Is this the name of the data set?
    first_integer = int(ifile.readline()) # You haven't told us what this is, either
    n_one = int(ifile.readline())
    n_two = int(ifile.readline())

    x_vals = []
    y_vals = []
    z_vals = []

    for index in range(n_one):
         x_vals.append(ifile.readline().strip())
    for index in range(n_two):
         y_vals.append(ifile.readline().strip())
    for index in range(n_one*n_two):
         z_vals.append(ifile.readline().strip())

You can turn this into a dataset generating function by adding a loop and yielding the values:

with open("inpfile.txt", "r") as ifile:
    while True:
        first_string = ifile.readline().strip() # Is this the name of the data set?
        if first_string == '':
            break
        first_integer = int(ifile.readline()) # You haven't told us what this is, either
        n_one = int(ifile.readline())
        n_two = int(ifile.readline())

        x_vals = []
        y_vals = []
        z_vals = []

        for index in range(n_one):
            x_vals.append(ifile.readline().strip())
        for index in range(n_two):
            y_vals.append(ifile.readline().strip())
        for index in range(n_one*n_two):
            z_vals.append(ifile.readline().strip())
        yield (x_vals, y_vals, z_vals) # and the first string and integer if you need those
Rob Watts
  • 6,866
  • 3
  • 39
  • 58
  • Thank you very much! I think this will be the way of doing it if I was interested only in the first set of data, but I would like to go through the whole file being able put into different arrays every set of data (lets say into data1 the data between first and second , into data2 the data between second and third and so on until the end of the file). The "first_integer" is an integer that has to do more with the process that generates that particular set of data, so I am not interested on that one, only on the n_one and n_two... – jealopez Jul 02 '13 at 22:54
  • but I think once I understand how to put into arrays the data between strings it will be easier for me to figure out how to read the n_one, n_two and so on as integers. Thanks. – jealopez Jul 02 '13 at 22:56
  • And yes, the first string (and each string) is the name of the corresponding data set. – jealopez Jul 02 '13 at 23:14
1
def dat_fun():
    with open("inpfile.txt", "r") as ifile:
        for line in ifile:
            if isinstance('line', str) or (not line): # 'line' is always a str, and so is the line itself
                break 
            for line in ifile:
                yield line

Change this to:

def dat_fun():
    with open("inpfile.txt", "r") as ifile:
        for line in ifile:
            if not line:
                break
            yield line
Rushy Panchal
  • 16,979
  • 16
  • 61
  • 94
  • `not line` is not likely to ever be `True`; all but the last line will have a newline, and even then the last line wouldn't be empty. – Martijn Pieters Jul 02 '13 at 22:50