2

I want to deal only with string which is NOT C++ comment, here is the pattern to find out C++ comment:

pattern = re.compile(r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"', re.DOTALL | re.MULTILINE)

However, I don't know how to make it to work as my intention.

# Python 3.4.2
s = '''
/****
C++ comments
  //pResMgr->CreateDialogEx();
****/
//pResMgr->CreateDialogEx();
/*//pResMgr->CreateDialogEx();*/

// real code, I want to replace only this following line of code
pResMgr->CreateDialogEx();
'''

newS = s.replace('CreateDialogEx', 'Create')
print(newS)

My expected output is:

/****
C++ comments
  //pResMgr->CreateDialogEx();
****/
//pResMgr->CreateDialogEx();
/*//pResMgr->CreateDialogEx();*/

// real code, I want to replace only this following line of code
pResMgr->Create();
O'Skywalker
  • 631
  • 5
  • 11
  • I would not use regex if I were you, I think you'd better iterate over your string removing what is after `//` until new line and what is after `/*` until `*/`... and THEN apply regex... – n00dl3 Jul 13 '15 at 07:31
  • @JuniusRendel , how about pResMgr->CreateDialogEx(); // pResMgr->CreateDialogEx();? – O'Skywalker Jul 13 '15 at 07:43
  • I don't understand what you mean... – n00dl3 Jul 13 '15 at 07:58
  • @JuniusRendel, pResMgr->CreateDialogEx(); // pResMgr->CreateDialogEx(), as in your advice, I can delete comments at first, it's true, but my result should contain the original unchanged comment. What's the time to add deleted text? – O'Skywalker Jul 13 '15 at 08:00
  • Possible duplicate with http://stackoverflow.com/questions/16720541/python-string-replace-regular-expression. – Eenoku Jul 13 '15 at 08:08
  • What I mean is the "do not process" logic would be easier and faster without using regex. – n00dl3 Jul 13 '15 at 08:28

1 Answers1

2

Didn't test it, but it works it with your case and fundamentally should work. It basically goes throught the text finding newline, // or /* and then handling the cases. Really simple, no regex.

source_code = '''//pResMgr//->CreateDialogEx();'''

def indexOf(string, character):
    return string.index(character) if character in string else 9999999

def replaceNotInComments(string, searchFor, replaceWith):
    result = ''
    nextBreak = 0
    while True:
        nextBreak = min(indexOf(string, '\n'),
                        indexOf(string, '/*'),
                        indexOf(string, '//'))
        if nextBreak == 9999999:
            result += string.replace(searchFor, replaceWith);
            break
        result += string[0:nextBreak].replace(searchFor, replaceWith);

        if nextBreak == indexOf(string, '\n'):
            string = string[nextBreak+1:]

        if nextBreak == indexOf(string, '/*'):
            string = string[nextBreak+2:]
            result += '/*'+string[0:indexOf(string, '*/')+2]
            string = string[indexOf(string, '*/')+2:]

        if nextBreak == indexOf(string, '//'):
            string = string[nextBreak+2:]
            if result != '':
                result += '\n'  
            result += string[0:indexOf(string, '\n')+1]
            string = string[indexOf(string, '\n')+1:]


    return result

result = replaceNotInComments(source_code, 'CreateDialogEx', 'Create')
print(result)
Martin Gottweis
  • 2,721
  • 13
  • 27
  • your result changes '/' to '\'! here is your result: \ **** C++ comments //pResMgr->CreateDialogEx(); ****/ pResMgr->CreateDialogEx(); \ *//pResMgr->CreateDialogEx();*/ // real code, I want to replace only this following line of code pResMgr->Create(); – O'Skywalker Jul 13 '15 at 08:32
  • 1
    oh, that was a typo. Works now? – Martin Gottweis Jul 13 '15 at 08:35
  • I found a serious issue, if source_code = '''//pResMgr->CreateDialogEx();''', your code still changes source_code – O'Skywalker Jul 13 '15 at 10:10
  • fixed again. btw, you can try to debug the code yourself – Martin Gottweis Jul 13 '15 at 11:08
  • ouch, I found another bug, suppose input: source_code = 'cout << "hello, money\r\n, new line;"', then output lost '\r\n', it becomes 'cout << "hello, money , new line;"', how can I change your code directly? – O'Skywalker Jul 14 '15 at 02:18
  • What would you want the output be? Now it basically outputs the same string since there is no comment. If you are missing the \r\n, it is probably converted into newline in your environment, but it is there. – Martin Gottweis Jul 14 '15 at 08:29