0

I tried various regexes including the ones mentioned @ Python 3 regular expression to find multiline comment to match my input file given below,complete code is also below.Following is the regex am using currenlty to match the input file @http://pastie.org/5653293

pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)

can someone provide inputs on why the regex is not matching?

import os
import sys
import re
import fnmatch

def find_and_remove(haystack, needle):
    re.escape(needle)
    pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
    return re.sub(pattern, "", haystack)

for path,dirs,files in os.walk(sys.argv[1]):
    for fname in files:
        for pat in ['*.cpp','*.c','*.h','*.txt']:
            if fnmatch.fnmatch(fname,pat):
                fullname = os.path.join(path,fname)
                # put all the text into f and read and replace...
                f = open(fullname).read()
                result = find_and_remove(f, r"Copyright (c) 2012, The Linux Foundation. All rights reserved")

INPUT:- http://pastie.org/5653293

Community
  • 1
  • 1
user1927396
  • 449
  • 1
  • 9
  • 21
  • You cannot parse programming languages with regexes. Consider something like https://bitbucket.org/eliben/pycparser – georg Jan 09 '13 at 10:50
  • 2
    And I'm wondering what's the point of removing TLF's copyrights from sources? – georg Jan 09 '13 at 10:52
  • @thg435 - may be it an overkill for this exercise unless you prove me wrong...I looked at the above doc..doesnt have good examples..have you used it before? – user1927396 Jan 09 '13 at 10:52
  • @thg435 - just removing the old 2012 comments...since its 2013 – user1927396 Jan 09 '13 at 10:53
  • No, I haven't used it before... I guess the api is similar to the python's ast module. You might want to ask [the author](http://stackoverflow.com/users/8206/eli-bendersky) for explanations and examples. – georg Jan 09 '13 at 10:58
  • @thg435 - I will check that in parallel..anything we can do with the existing code..i notice that only with this copyright think it doesnt match..it works for everyother comment..so am sure we could tweak this a little.. – user1927396 Jan 09 '13 at 11:00
  • The first thing to try is to `re.escape` the "needle" to disable regex special characters (like `()`). – georg Jan 09 '13 at 11:01
  • @thg435 - I actually tried re.escape(needle)..did not work – user1927396 Jan 09 '13 at 11:06
  • @thg435 - I kept trying with different options...nothing is working for me..do you have any inputs on why the regex is not matching – user1927396 Jan 09 '13 at 17:36

1 Answers1

1

Use "Copyright \(c\) 2012, The Linux Foundation. All rights reserved". You need to escape the parenthesis since they already have a meaning (capture) in regular expressions.

Search for more info on how to escape regexes.

tohava
  • 5,344
  • 1
  • 25
  • 47