13

A Python program I'm writing is to read a set number of lines from the top of a file, and the program needs to preserve this header for future use. Currently, I'm doing something similar to the following:

header = ''
header_len = 4
for i in range(1, header_len):
    header += file_handle.readline()

Pylint complains that I'm not using the variable i. What would be a more pythonic way to do this?

Edit: The purpose of the program is to intelligently split the original file into smaller files, each of which contains the original header and a subset of the data. So, I need to read and preserve just the header before reading the rest of the file.

mkj
  • 2,761
  • 5
  • 24
  • 28
GreenMatt
  • 18,244
  • 7
  • 53
  • 79

9 Answers9

13
f = open('fname')
header = [next(f) for _ in range(header_len)]

Since you're going to write header back to the new files, you don't need to do anything with it. To write it back to the new file:

open('new', 'w').writelines(header + list_of_lines)

if you know the number of lines in the old file, list_of_lines would become:

list_of_lines = [next(f) for _ in range(chunk_len)]
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
12

I'm not sure what the Pylint rules are, but you could use the '_' throwaway variable name.

header = ''
header_len = 4
for _ in range(1, header_len):
    header += file_handle.readline()
David Claridge
  • 6,159
  • 2
  • 27
  • 25
  • You don't need to use the for loop. I recommend a list comprehension (see my post below). Good call on the throwaway variable, though. – Escualo Dec 11 '09 at 05:52
  • @Roger Pate: can you explain? – Escualo Dec 11 '09 at 06:08
  • @unknown, there's nothing wrong with using for loops. for loops are integral part of Python and are basic concepts of programming. If somebody says otherwise not to use it, tell them to take a hike – ghostdog74 Dec 11 '09 at 13:48
  • You learn something new everyday - I didn't know about the _ variable. Thanks! +1 – GreenMatt Dec 13 '09 at 23:06
10
import itertools

header_lines = list(itertools.islice(file_handle, header_len))
# or
header = "".join(itertools.islice(file_handle, header_len))

Note that with the first, the newline chars will still be present, to strip them:

header_lines = list(n.rstrip("\n")
                    for n in itertools.islice(file_handle, header_len))
  • If you strip the lines it will be difficult to recall the structure of the original header. I recommend you keep them. – Escualo Dec 11 '09 at 05:53
  • No, it won't. In that example they are stored in a list rather than one long string. Which he should use depends on what he's doing with the data later. –  Dec 11 '09 at 05:57
  • The OP writes in his script 'header += ...' so I think he meant a single string, but you are right: it depends. – Escualo Dec 11 '09 at 06:15
  • Arrieta: that's why I used separate header and header\_lines variables. –  Jan 03 '10 at 13:27
  • Anurag: your own answer doesn't even use "for line in f", nor do any of the answers I currently see iterate the file directly---if anything, itertools is the only solution here that uses the file as an iterator and is thus the closest answer to "for line in f". –  Jan 03 '10 at 13:29
4

My best answer is as follows:

file test.dat:

This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9

Python script:

f = open('test.dat')
nlines = 4
header = "".join(f.readline() for _ in range(nlines))

Output:

>>> header
'This is line 1\nThis is line 2\nThis is line 3\nThis is line 4\n'

Notice that you don't need to call any modules; also that you could use any dummy variable in place of _ (it works with i, or j, or ni, or whatever) but I recomend you don't (to avoid confusion). You could strip the newline characters (though I don't recommend you do - this way you can distinguish among lines) or do anything that you can do with strings in Python.

Notice that I did not provide a mode for opening the file, so it defaults to "read only" - this is not Pythonic; in Python "explicit is better than implicit". Finally, nice people close their files; in this case it is automatic (because the script ends) but it is best practice to close them using f.close().

Happy Pythoning.

Edit: As pointed out by Roger Pate the square brackets are unnecessary in the list comprehension, thereby reducing the line by two characters. The original script has been edited to reflect this.

Escualo
  • 40,844
  • 23
  • 87
  • 135
  • 2
    When you don't actually need a list and any iterable will work, such as the parameter to `"".join` here, then a generator expression is better, easier (by two keystrokes ;), and more clear than a list comprehension: `"".join(..)` instead of `"".join([..])`. They are related, and a LC is actually a special case of a genexp (in my view at least), where `[..]` is just convenience for `list(..)`. http://www.python.org/dev/peps/pep-0289/ –  Dec 11 '09 at 06:15
  • 1
    yes i did read. I still want you to close it for the benefit of others who only want to see code and doesn't want to read. – ghostdog74 Dec 11 '09 at 10:07
  • @Arrieta: Did NASA approve your use of their logo? ;-p – GreenMatt Dec 13 '09 at 23:02
  • Actually in `join` you have to use a list comprehension and not an iterator for performance ;) – Mr_and_Mrs_D Jan 28 '17 at 14:49
1

May be this:

header_len = 4
header = open("file.txt").readlines()[:header_len]

But, it will be troublesome for long files.

mshsayem
  • 17,557
  • 11
  • 61
  • 69
  • 5
    .readlines() reads the entire file, though.. if you have a large file and don't want to read the whole thing into memory, this could be a bad idea – David Claridge Dec 11 '09 at 05:05
  • yeah, I have added that while you were writing this, ;) – mshsayem Dec 11 '09 at 05:06
  • 1
    @david : guido please make it lazy lazy very lazy...http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python – Pratik Deoghare Dec 11 '09 at 05:10
  • 2
    There's no need, now that we have `itertools.islice`. – Robert Rossney Dec 11 '09 at 09:15
  • +1 for simplicity and OP can use the rest of the list items easily to split into smaller files. readlines() does read the entire file, but I am not going to -1 you for that, since we don't know if OP's files are that big in the GB range, so it might still be ok for OP to use this method. – ghostdog74 Dec 11 '09 at 10:15
1

I do not see any thing wrong with your solution, may be just replace i with _, I also do not like invoking itertools everywhere where simpler solution will work, it is like people using jQuery for trivial javascript tasks. anyway just to have itertools revenge here is my solution

as you want to read whole file anyway line by line, why not just first read header and after that do whatever you want to do

header = ''
header_len = 4

for i, line in enumerate(file_handle):
    if i < header_len:
        header += line
    else:
        # output chunks to separate files
        pass

print header
Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219
0

What about:

header = []
for i,l in enumerate(file_handle):
    if i <= 3: 
         header += l
         continue
    #proc rest of file here
Claudiu
  • 224,032
  • 165
  • 485
  • 680
0

One problem with using _ as a dummy variable is that it only solves the problem on one level, consider something like the following.

def f(n, m):
"""A function to run g() n times and run h() m times per g."""
    for _ in range(n):
        g()
        for _ in range(m):
            h()
    return 0

This function works fine but the _ iterator over m runs is problematic as it may conflict with the upper _. In any case PyCharm is complaining about this kind of syntax.

So I would argue that _ is not as "throwaway" as was suggested before.

Perhaps you might like to just create a function to do it!

def run(f, n, *args):
    """Runs f with the arguments from the args tuple n times."""
    for _ in range(n):
        f(*args)

e.g. you could use it like this:

>>> def ft(x, L):
...     L.append(x)

>>> a = 7
>>> nums = [4, 1]
>>> run(ft, 10, a, nums)
>>> nums
[4, 1, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7]
Gregory Fenn
  • 460
  • 2
  • 13
-1
s=""
f=open("file")
for n,line in enumerate(f):
  if n<=3 : s=s+line
  else:
      # do something here to process the rest of the lines          
print s
f.close()
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • He seems to want the result in a single string (notice he writes header += ...) – Escualo Dec 11 '09 at 06:06
  • 1
    I think this implementation is overly complicated for such a simple task; it reads like C on Python - take advantage of the "Batteries Included" philosophy and use the existing methods on the objects. – Escualo Dec 11 '09 at 06:29
  • overly complicated?? what criteria do you use to judge?? number of characters of code? number of lines of code?? Batteries included?? What kind of batteries are you talking about that i am not using? you can test my code versus your code with millions of lines, and they both perform on par. So what's the deal? – ghostdog74 Dec 11 '09 at 06:55
  • 3
    The "Batteries Included" is a motto of the Python Language (cf. website) "Fans of Python use the phrase "batteries included" to describe the standard library". What I mean is that your style is not taking advantage of the Standard Library and, by doing so, you are reinventing the wheel. This is not in line with Python's philosophy. By reinventing the wheel you condemn others to understand your logic (which could be difficult in some cases): by using the Standard Library you can express your ideas at a higher level of abstraction and don't distract your code logic with wheel reinventions. – Escualo Dec 11 '09 at 08:04
  • No need in going around downvoting - this is a place to learn and you cannot get offended by people commenting on your code. If you cannot stand the heat, keep out of the kitchen. – Escualo Dec 11 '09 at 08:09
  • I've program Python since ver 1.5 and i do know what batteries included mean. So If I use std library, you would understand right away what i am writing? For eg, itertools. Older ver of Python may not have it. Also, for this simple task, there is no need to use it or other libraries. Sometimes, going down to the basics is still advantageous. If people don't understand what i write as in my solution, i can only say they are not understanding their basics. I can stand the heat, but not when the comments are ridiculous and based on subjective personal opinions and not tackling the problem at hand – ghostdog74 Dec 11 '09 at 09:47
  • finally i want to say, I use standard libraries when i need to. Other than that as for this OP's case, its so simple, there's no need to use one. – ghostdog74 Dec 11 '09 at 09:49