Removing lines from a file using python

Question

Possible Duplicate:
Python 3 regular expression to find multiline comment

I need some inputs on how this can be done,really appreciate your inputs,I looked at other posts but none of them matches my requirement.

How to remove line from the file in python Remove lines from textfile with python

I need to match a multi-line comment in a file based on a input string provided.

Example:-

Lets say if the file "test.txt" has the following comment,if inputstring="This is a test, script written" this comment needs to be deleted from the file

import os
import sys

import re
import fnmatch

def find_and_remove(haystack, needle):
    pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
    return re.sub(pattern, "", haystack)

for path,dirs,files in os.walk(sys.argv[1]):
    for fname in files:
        for pat in ['*.cpp','*.c','*.h','*.txt']:
            if fnmatch.fnmatch(fname,pat):
                fullname = os.path.join(path,fname)
                with open(fullname, "r") as f:
                    find_and_remove(f, r"This is a test, script written")

Error:-

Traceback (most recent call last):
  File "comment.py", line 16, in <module>
    find_and_remove(f, r"This is a test, script written")
  File "comment.py", line 8, in find_and_remove
    return re.sub(pattern, "", haystack)
  File "/usr/lib/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
TypeError: expected string or buffer

Stop reposting this question. Please. – Tim Jan 09 '13 at 06:59 — Tim, Jan 09 '13 at 06:59
@Tim - I just need ideas to work on..what is wrong in that? – user1927396 Jan 09 '13 at 07:04 — user1927396, Jan 09 '13 at 07:04

sea-rob · Answer 1 · 2013-01-09T07:49:56.727

The first thing that came to mind when I saw the question was "state machine", and whenever I think "state machine" in python, the first thing that comes to mind is "generator" a.k.a. yield:

def skip_comments(f):
    """
    Emit all the lines that are not part of a multi-line comment.
    """
    is_comment = False

    for line in f:
        if line.strip().startswith('/*'):
            is_comment = True

        if line.strip().endswith('*/'): 
            is_comment = False
        elif is_comment:
            pass
        else:
            yield line


def print_file(file_name):
    with file(file_name, 'r') as f:
        skipper = skip_comments(f)

        for line in skipper:
            print line,

EDIT: user1927396 upped the ante by specifying that it's just a specific block to exclude, that contains specific text. Since it's inside the comment block, we won't know up front if we need to reject the block or not.

My first thought was buffer. Ack. Poo. My second thought was a haunting refrain I've been carrying in my head for 15 years and never used until now: "stack of state machines" ...

def squelch_comment(f, first_line, exclude_if):
    """
    Comment is a multi-line comment that we may want to suppress
    """
    comment = [first_line]

    if not first_line.strip().endswith('*/'):
        for line in f:

            if exclude_if in line:
                comment = None

            if comment and len(comment):
                comment.append(line)

            if line.strip().endswith('*/'):
                break

    if comment:
        for comment_line in comment:
            yield '...' + comment_line


def skip_comments(f):
    """
    Emit all the lines that are not part of a multi-line comment.
    """
    for line in f:
        if line.strip().startswith('/*'):
            # hand off to the nested, comment-handling, state machine
            for comment_line in squelch_comment(f, line, 'This is a test'):
                yield comment_line
        else:
            yield line


def print_file(file_name):
    with file(file_name, 'r') as f:
        for line in skip_comments(f):
            print line,

Rob - I only want to remove a specific multi-line comment ,not all the multi-line comments — user1927396, Jan 09 '13 at 07:13
Hi Rob - I understood that part but how do i tell the script to do both..it needs to know if .startswith('/*') and it contains "This is a test, script written" — user1927396, Jan 09 '13 at 07:28
(deleted my earlier comment ;) ) that's a little harder, because you're getting into look-ahead logic. What I'd do in that case is instead of "pass" for the line, shove all the comment lines into an list. When you get to the end of the comment, iterate through the list. If you see the target line there, pass. If you don't see it, yield each line in the list. ...look-aheads complicate everything ;) — sea-rob, Jan 09 '13 at 07:31
looks complicated..how do we pass all the lines in the list based on a single input string/line(lets say,This is a comment line) ? — user1927396, Jan 09 '13 at 07:42
Thanks for helping out..major problem with the code is the string is not exactly what it is ..need to take care of white spaces in between and make the dot(.)'s optional...even if its the exact same line.it not working :-( — user1927396, Jan 09 '13 at 08:19
anything more you can help..I tried yours and gauden..both of them are not working — user1927396, Jan 09 '13 at 09:39
use gauden's ... it's much nicer. You might have to go through the usual troubleshooting steps like adding prints or logging statements to make sure you're getting what you expect at each step. — sea-rob, Jan 09 '13 at 16:42

score 1 · Answer 2 · answered Jan 09 '13 at 07:42

1

this should work in principe

def skip(file, lines):
 cline = 0
 result = ""
 for fileLine in file.read():
  if cline not in lines:
   result += fileLine
  cline += 1
 return result

lines must be a list of numbers and file must be an openned file

answered Jan 09 '13 at 07:42

Arnaud Aliès

1,079
13
26

This might not work for me since my file contains not only numbers.its a regular c code – user1927396 Jan 09 '13 at 07:48
you obviously don't understand the code – Arnaud Aliès Jan 09 '13 at 07:49
you might be right 50% .can you please explain the code? – user1927396 Jan 09 '13 at 07:51

score 1 · Accepted Answer · edited May 23 '17 at 11:48

1

This one does it as in the request: deletes all multiline comments that contain the desired string:

Put this in a file called program.txt

/*
 * This is a test, script written
 * This is a comment line
 * Multi-line comment
 * Last comment
 *
 */

some code

/*
 * This is a comment line
 * And should 
 *     not be removed
 *
 */

more code

Then search and replace. Just make sure the needle does not introduce some regex special characters.

import re

def find_and_remove(haystack, needle):
    pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
    return re.sub(pattern, "", haystack)

# assuming your program is in a file called program.txt
program = open("program.txt", "r").read()

print find_and_remove(program, r"This is a test, script written")

The result:

some code

/*
 * This is a comment line
 * And should 
 * not be removed
 *
 */

more code

It adapts the regex in the related question

Editing the last section in your code:

for path,dirs,files in os.walk(sys.argv[1]):
    for fname in files:
        for pat in ['*.cpp','*.c','*.h','*.txt']:
            if fnmatch.fnmatch(fname,pat):
                fullname = os.path.join(path,fname)
                # put all the text into f and read and replace...
                f = open(fullname).read()
                result = find_and_remove(f, r"This is a test, script written")

                new_name = fullname + ".new"
                # After testing, then replace newname with fullname in the 
                # next line in order to replace the original file.
                handle = open(new_name, 'w')
                handle.write(result)
                handle.close()

Make sure that in the needle you escape all regex special characters e.g. (). If your text contains brackets, eg, (any text) they should appear in the needle as \(any text\)

edited May 23 '17 at 11:48

Community

1
1

answered Jan 09 '13 at 07:51

daedalus

10,873
5
50
71

how easy it is to get it to work on files? – user1927396 Jan 09 '13 at 08:01
i made some changes to your code to loop over a directory and running into a compilation error..I updated my original question with it..can you please see what is wrong? – user1927396 Jan 09 '13 at 08:29
You need to pass a string into the function I gave you, you were instead passing a file handle. This should clear it. – daedalus Jan 09 '13 at 08:54
Script seems to work fine but not seeing the result...exact code is http://pastie.org/5653294..sample input file am trying is http://pastie.org/5653293 ..any idea what is wrong here? – user1927396 Jan 09 '13 at 09:08
Thanks,waiting for your edit – user1927396 Jan 09 '13 at 09:21
@Guaden - We need to write to the same file name,cannot change the file name...also even with the new file model..I dont see the comment removed...looks like this is very challenging – user1927396 Jan 09 '13 at 09:34
See my latest edit. Test carefully before you do that. Use this system for testing, then simply write back out to the original filename in order to replace. I leave that to you after you have checked the search, replace, rename actually works as you want it. (Do remember to uptick the answer). – daedalus Jan 09 '13 at 09:41
is that typo for new_name and newname?does your code for the input i gave?it doesnt for me – user1927396 Jan 09 '13 at 09:46
I tried re.escape(needle) but it still doesnt work...there is something am missing..appreciate your help on this..did it work for you on the input here http://pastie.org/5653293 if the string is "Copyright (c) 2012, The Linux Foundation. All rights reserved." – user1927396 Jan 09 '13 at 17:28
I tried the same by simply using `The Linux Foundation`as the string to avoid the brackets and fullstop and it worked for me, rather than looking for the whole line. I am not sure if it appears in other parts of the scripts. – daedalus Jan 09 '13 at 18:00

Removing lines from a file using python

3 Answers3