More efficient and general way to prepend text before lines of a file?

Question

I am new to python. In one of task I have to add a character before specific lines.For example, in my text file the

Name

and

Surname

are the fixed lines on which I have to either add or delete ; based on flag

hey : see
;Name:blah
;Surname:blah

This is the code I have written for the same... Is it efficient enough? Can we write more efficient and can we pass

Name and Surname

as arguments I mean the keywords as an arguments to the function to which add

;

def changefile(filepath,flag):
   # flag = 1
    final = ""
    with open(filepath) as content:
        for line in content:
            if flag==1:
                if line.split(":")[0]==";Name" or line.split(":")[0]==";Surname":
                    final += line[1:]
                else:
                    final += line
            else:
                if line.split(":")[0]=="Name" or line.split(":")[0]=="Surname":
                    final += ";"
                final += line
    f = open(filepath, 'r+')
    f.truncate()
    f.write(final)
    f.close()


changefile("abc.txt",0)

If it works, it works.... this is not [Code Review](http://codereview.stackexchange.com/). That said, I can't think off the top of my head why this wouldn't be a good solution. — Aaron, Jul 12 '16 at 17:19
string += string is an anti-pattern in Python; although [CPython does optimize it](http://stackoverflow.com/a/19926932/478656), it's not very Pythonic, and building a list and then joining it is more likely to be performant over more versions and implementations of Python. That's the biggest potential "inefficiency" I can see. That said, [don't optimize unless you know you need to.](http://stackoverflow.com/a/1316959/478656). I'd be much more concerned about the readability of things like your vague flag name, your vague function name, using 1,0 instead of True,False. — TessellatingHeckler, Jul 12 '16 at 18:28

TessellatingHeckler · Answer 1 · 2016-07-14T17:56:16.883

I poked at it a lot, and borrowed martineau's ideas, and ended up with this:

def change_file(filepath, add_comment, trigger_words):

    def process(line):
        line_word = line.lstrip(';').split(':')[0]

        if line_word in trigger_words:
            if add_comment:
                line = line if line.startswith(';') else ';' + line
            else:
                line = line.lstrip(';')

        return line


    with open(filepath) as f:
        content = [process(line) for line in f]


    with open(filepath, 'r+') as f:
        f.truncate()
        f.write(''.join(content))


change_file('abc.txt', add_comment=True, trigger_words=["sys", "netdev"])

The main "nice" bit (that I like) is using a list comprehension [process(line) for line in f], because it does away with the whole final = ''; final += blah arrangement. It processes every line and that's the output.

I've changed the flag so instead of reading "flag is 0 or 1" (what does that mean?) it now reads "add_comment is True or False", to more clearly indicate what it does.

In terms of efficiency, it could be better; (make "trigger_words" a set so that testing membership was faster, change the way it normalizes every line for testing); but if you're processing a small file it won't make much difference Python is fast enough, and if you're processing an enormous file, it's more likely IO limited than CPU limited.

Try it online here: https://repl.it/CbTo/0 (it reads, and prints the results, it doesn't try to save).

(NB. .lstrip(';') will remove all semicolons at the start of the line, not just one. I'm assuming there's only one).

Edit from the comments. Here's a version which will process the SQL Server UTF-16 install file without screwing it up, however I don't know of a general fix for this that will work for all files. Note this reads the file as a specific data encoding, and writes a binary file with a specific data encoding. And changes the split to = for the SQL ini format. And doesn't truncate because w mode does that.

import codecs

def change_file(filepath, add_comment, trigger_words):

    def process(line):
        line_word = line.lstrip(';').split('=')[0]

        if line_word in trigger_words:
            if add_comment:
                line = line if line.startswith(';') else ';' + line
            else:
                line = line.lstrip(';')

        return line


    with codecs.open(filepath, encoding='utf-16') as f:
        content = [process(line) for line in f]

    with codecs.open(filepath, 'wb', encoding='utf-16') as f:
        f.write(''.join(content))


change_file('d:/t/ConfigurationFile - Copy.ini', add_comment=True, trigger_words=["ACTION", "ENU"])

thanks for the solution it works fine for the txt file as mentioned in the question... but when i apply same solution to config file my config file gets damaged.... can u help me in that — user2591307, Jul 14 '16 at 06:33
@user2591307 maybe, if you explain what you mean by 'damaged', because I can't guess. — TessellatingHeckler, Jul 14 '16 at 07:38
@TessellatingHeckler: FWIW, I didn't change `flag` into something like your `add_comment` because that's not all that it controls—although it is two valued like a boolean—it means "add comment" or "remove comment" which doesn't exactly correspond to True and False. Logically it should probably be named something like `operation` and have one of two enumerated values. Python 3 has a new `enum` module, but in Python 2 one has to simulate them (which isn't too difficult). — martineau, Jul 14 '16 at 12:06
@user2591307 I tried a SQL install file; it is (*writing saved as UTF-16 format*) - two bytes means one character. Python 2 reads text as if each byte is a character, so it sees double the data. `hi\r\n` (`\r\n` is a newline in Windows) becomes `h0i0\r0\n0`- a mess. Write that back, and Python doesn't see `\r\n` because it was separated, it "helpfully" changes `\n` into `\r\n` - oops! - `h0i0\r0\r\n0`. *One more byte*. Open in Notepad and all the "*two bytes means one character*" pairs have been shifted by one place, half a character, after the newline. Result: wrong characters show up. — TessellatingHeckler, Jul 14 '16 at 17:49
Interestingly, if it's shifting out of place by one at each newline, it never falls back into sync in that file because after the first line, every newline has a blank line after it. Always a double newline keeps it out of place for the rest of the file. If it shifted out of sync and back in, it [would look like this](http://i.imgur.com/AaQQLpz.jpg). Anyway. I can fix my code for *this file* but then it won't work on plain ASCII files. I don't know if there's a fix for everything. Will edit. — TessellatingHeckler, Jul 14 '16 at 17:53
@martineau fair comment; I wrote it a couple of times trying to make it a generic `prefix/unprefix` system, then separate `add_comment()/remove_comment()` functions, then merged them back in to one function and they became `add_comment True/False`. I agree that doesn't read very well - your ENUM approach sounds more sensible. — TessellatingHeckler, Jul 14 '16 at 17:59
@TessellatingHeckler can you add comments to above so that it is easy to understand — user2591307, Jul 29 '16 at 17:22

More efficient and general way to prepend text before lines of a file?

1 Answers1