0

I have a text file consisting of data that is separated by tab-delimited columns. There are many ways to read data in from the file into python, but I am specifically trying to use a method similar to one outlined below. When using a context manager like with open(...) as ..., I've seen that the general concept is to have all of the subsequent code indented within the with statement. Yet when defining a function, the return statement is usually placed at the same indentation as the first line of code within the function (excluding cases with awkward if-else loops). In this case, both approaches work. Is one method considered correct or generally preferred over the other?

def read_in(fpath, contents=[], row_limit=np.inf):
    """ 
    fpath is filelocation + filename + '.txt'
    contents is the initial data that the file data will be appeneded to  
    row_limit is the maximum number of rows to be read (in case one would like to not read in every row).
    """
    nrows = 0
    with open(fpath, 'r') as f:
        for row in f:
            if nrows < row_limit:
                contents.append(row.split())
                nrows += 1
            else:
                break

        # return contents
    return contents

Below is a snippet of the text-file I am using for this example.

1996 02 08  05 17 49    263     70    184     247    126      0     -6.0    1.6e+14    2.7e+28    249
1996 02 12  05 47 26     91     53    160     100    211    236      2.0    1.3e+15    1.6e+29     92
1996 02 17  02 06 31    279     73    317     257    378    532      9.9    3.3e+14    1.6e+29    274
1996 02 17  05 18 59     86     36    171      64    279    819     27.9     NaN      NaN      88
1996 02 19  05 15 48     98     30    266     129    403    946     36.7     NaN      NaN      94
1996 03 02  04 11 53     88     36    108      95    120    177      1.0    1.5e+14    8.7e+27     86
1996 03 03  04 12 30     99     26    186     141    232    215      2.3    1.6e+14    2.8e+28     99

And below is a sample call.

fpath = "/Users/.../sample_data.txt"
data_in = read_in(fpath)
for i in range(len(data_in)):
    print(data_in[i])

(I realize that it's better to use chunks of pre-defined sizes to read in data, but the number of characters per row of data varies. So I'm instead trying to give user control over the number of rows read in; one could read in a subset of the rows at a time and append them into contents, continually passing them into read_in - possibly in a loop - if the file size is large enough. That said, I'd love to know if I'm wrong about this approach as well, though this isn't my main question.)

2 Answers2

0

If your function needs to do some other things after writing to the file, you usually do it outside the with block. So essentially you need to return outside the with block too.

However if the purpose of your function is just to read in a file, you can return within the with block, or outside it. I believe none of the methods are preferred in this case.

I don't really understand your second question.

hsnsd
  • 1,728
  • 12
  • 30
  • Regarding the second question, [this post](https://stackoverflow.com/a/519653/7345804) among others suggests using a pre-defined number of characters (chunk size) to read data from a file to conserve memory. I'm not sure what to set as a chunk size since the number of characters per row in the data file is not constant, hence the using `for row in f` approach. I was trying to say that one could read some of the rows in at a time, store them in `contents`, and continue reading the file where it left off by including `f.seek()` and `f.tell()` methods. –  Mar 08 '18 at 06:06
  • I am reading and not writing to a file. Can you elaborate on what methods would be preferred in this case? –  Mar 08 '18 at 06:07
  • @mikey since you dont have any functionality to do after reading the file, you can use either approach i.e. return within or outside the with block. It wont really make any difference. For your second query, https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python/519653#519653 suggests that a line-based file doesnt need to be divided in chunks since it processes the file line by line only, so you can append to content after reading each line of the file. – hsnsd Mar 08 '18 at 12:12
0

You can put return also withing with context.

By exiting context, the cleanup are done. This is the power of with, not to need to check all possible exit paths. Note: also with exception inside with the exit context is called.

But if file is empty (as an example), you should still return something. So in such case your code is clear, and follow the principle: one exit path. But if you should handle end of file without finding something important, I would putting normal return within with context, and handle the special case after it.

Giacomo Catenazzi
  • 8,519
  • 2
  • 24
  • 32