1

I have a Problem to understand regex! Every time I think I do it - I don’t do it!


The Problem:

I write a formatter for a custom language (aveva Intouch). Now I try to find all keywords to uppercase them.

The expression is:

/(\b(as|eof|if|endif|then|dim)\b)/gmi

it's ok... - now please not in comments ({ comment }):

/(?![^{]*})(\b(as|eof|if|endif|then|dim)\b)/gmi

it works... now please not in strings to:

There I find a solution to select text between quotes:

RegEx: Grabbing values between quotation marks

But I CAN'T reverse it

/(?!((["'])(?:(?=(\\?))\2.)*?\1))(\b(as|eof|if|endif|then|dim)\b)/gmi

I try it now for hours, and look for some issues it will not work. I think there is a small but general understanding problem.


Question:

How I can create a regex including both: select all keywords from text where the text is not in comment AND not in string?

Please help me understand the combination of select / negation of them, and combination of them. Or is it not the best practice to use regex for this problem?

wowandy
  • 1,124
  • 2
  • 10
  • 23
Vivil
  • 39
  • 5
  • sorry, forgot to include test-text: https://github.com/vitalyruhl/intouch-language/blob/0badd2cfe19e5885b3d719c8edc1c51d6cb8d249/test/test.vbi And my trys are there: https://github.com/vitalyruhl/intouch-language/blob/0badd2cfe19e5885b3d719c8edc1c51d6cb8d249/src/functions.ts – Vivil Oct 25 '21 at 14:20
  • 1
    Right, a single plain regex is not the best tool for this. – Wiktor Stribiżew Oct 25 '21 at 14:23
  • parse the text as a sequence of Code/String/Comment blocks, feed all the Code sections to a regex. keep track of the begin and end position of each block in the big text – rioV8 Oct 25 '21 at 15:21
  • have you ever considered formatting your JSON files – rioV8 Oct 25 '21 at 15:32
  • Thanks, @rioV8: but what do you mean wit JSON File? Syntax highlighting is ready, last release there is 2020/07. I will now to implement formatting feature. I didn't know how to do this with JSON? – Vivil Oct 25 '21 at 17:05
  • your indentation of the JSON in the repo is a mess – rioV8 Oct 25 '21 at 19:16
  • do you mean **~/.vscode/extensions/intouch-language/syntaxes/intouch.tmLanguage.json**? Please specify and, if possible, with an example. Or did you mean the test.vbi - this is the test file to see whether the code is now also formatted - it is just a hodgepodge, without function. – Vivil Oct 25 '21 at 19:51

1 Answers1

0

I have now come to the conclusion that regex is really not suitable for this.I Will do it the tried and true way (for in...). Furthermore, I find some rules for this, but recursive regex blow my mind. I definitely don't understand it.

That is my first try...

export function forFormat(text: string, config: any): string {
    //let txt = runes(text);//Splitt text into single character
    let txt = text.split('');//Splitt text into single character
    let buf: string = '';
    let i: any = 0;
    let modified: number = 0;

    let inComment: boolean = false;
    let inString: boolean = false;

    let LineCount: number = 1;
    let ColumnCount: number = 0;

    for (i = 0; i <= txt.length - 1; i++) {

        //Columncount
        ColumnCount++;

        if (modified > 0) {
            modified--;
        }
        else {
            modified = 0;
            //check for String-End (check before begin!)
            if (inString && (txt[i] === '"')) {
                if (!(txt[i - 1] === '\\')) {  //check for escaped quot
                    inString = false;
                    //log("info", `Info @ Line ${LineCount} at Column ${ColumnCount} -> closed string detected!`);
                }
            }
            else if (!inComment) {//check for String-Begin, but not the same char as close!
                if (txt[i] === '"') {
                    inString = true;
                    //log("info", `Info @ Line ${LineCount} at Column ${ColumnCount} -> Open string detected!`);
                }
            }

            //Linecount 
            if (txt[i] === LF) { //txt[i] === CRLF || txt[i] === CR || txt[i] === LF

                if (inString) {//check for String error, because there is no way to declara string over multiple Line!
                    log("Error", `Error @ Line ${LineCount} at Column ${ColumnCount} -> no closed string detected!`);
                    log("Error", buf);
                    return text; //return unformatet text
                }

                LineCount++;
                ColumnCount = 0;
            }


            //check for comment-error
            if (!inComment && (txt[i] === '}')) {
                log("Error", `Error @ Line ${LineCount} at Column ${ColumnCount} -> closed comment bracket witout Open comment bracket!`);
                log("Error", buf);
                return text; //return unformatet text
            }
            else if (txt[i] === '{') { //check for Comment-Begin
                inComment = true;
            }
            else if (inComment && (txt[i] === '}')) {//check for Comment-End
                inComment = false;
            }


            //formatting session
            if (!inString) {

                if (!(modified > 0) && (!inComment || config.KeywordUppercaseAlsoInComment)) {
                    let j: any;

                    let wbf: string = '';//word-bindery-test-char-before
                    let wba: string = '';//word-bindery-test-char-after

                    //check for KEYWORDS
                    for (j in KEYWORDS) {
                        let k: any;
                        let tt: string = '';
                        wbf = text[i - 1];
                        for (k = 0; k <= KEYWORDS[j].length - 1; k++) {
                            tt += text[i + k];
                            wba = text[i + k + 1];
                        }

                        if (tt.toLowerCase() === KEYWORDS[j].toLowerCase()) {
                            if (CheckCRLForWhitespace(wbf) && CheckCRLForWhitespace(wba)) {//check Word-Binary
                                tt = KEYWORDS[j].toUpperCase() + wba;
                                buf += tt;
                                modified = tt.length - 1;
                            }
                        }
                    }


                    //check for Double
                    if (!(modified > 0)) {
                        for (j in DOUBLE_OPERATORS) {//check double operators first
                            let k: any;
                            let tt: string = '';
                            wbf = text[i - 1];

                            for (k = 0; k <= DOUBLE_OPERATORS[j].length - 1; k++) {
                                tt += text[i + k];
                                wba = text[i + k + 1];
                            }

                            if (tt === DOUBLE_OPERATORS[j]) {

                                if (text[i - 1] !== ' ') {
                                    buf += ' ';
                                }

                                buf += DOUBLE_OPERATORS[j];

                                if (text[i + 2] !== ' ') {
                                    buf += ' ';
                                }

                                modified = 1;

                                break;
                            }
                        }
                    }

                    //check for single operators
                    if (!(modified > 0)) {
                        for (j in SINGLE_OPERATORS) {//check double operators first

                            if (text[i] === SINGLE_OPERATORS[j]) {

                                if (text[i - 1] !== ' ') {
                                    buf += ' ';
                                }

                                buf += text[i];

                                if (text[i + 1] !== ' ') {
                                    buf += ' ';
                                }

                                modified = -1;
                                //console.log('Operator:[', text[i], ']modified:', modified);
                                break;
                            }
                        }
                    }
                }

            }//formatin session

            if (modified === 0) {//insert only when on this session not modified!
                buf += txt[i];
            }

        }//not modified

    }//for
    //console.log(buf);

    return buf;
}//function


function CheckCRLForWhitespace(s: string): boolean {
    let checks: string[] = [];
    let check: boolean[] = [];
    let test: boolean = false;

    checks = FORMATS.concat(SINGLE_OPERATORS);
    checks = checks.concat(DOUBLE_OPERATORS);

    checks = checks.concat(TRENNER);

    check = checks.map(item => {
        if (s === item) {
            return true;
        }
        return false;
    });


    test = test || check.some(it => it === true);

    //console.log('check', s, test);
    return test;
}
Vivil
  • 39
  • 5
  • reduce `CheckCRLForWhitespace` to `return FORMATS.concat(SINGLE_OPERATORS, DOUBLE_OPERATORS, TRENNER).some(item => s === item)` – rioV8 Oct 27 '21 at 09:46