1

I'm trying to find every string in a program longer than N characters (5 in this test) but only on lines that don't contain the word "printf" (or maybe "printf\s*(")

These all fail

  /^.*(?!printf).+"[^"]{5,}".*$/,
  /^(?!printf)*"[^"]{5,}"/,
  /^((?!printf).)*"[^"]{5,}((?!printf).)*"$/,

I saw this but seems irrelevant. I saw this which seems closer but doesn't work. This one too

If I separate it into 2 problems, first filter all lines with printf, then search for lines with 5 character strings it's easy but I'd actually like to use this regex in vscode or other editor that supports regular expressions so I need to do it in one expression.

const regexs = [
  /^.*(?!printf).+"[^"]{5,}".*$/,
  /^(?!printf)*"[^"]{5,}"/,
  /^((?!printf).)*"[^"]{5,}((?!printf).)*"$/,
];

const lines = [
  'write("blueberry"); // yum',                    // match
  'printf("-%s-", "strawberry"); // whatever',     // do not match
  'x = 12; printf("lime"); write("coconut")',      // do not match
  'x = 12; write("coconut") printf("lime");',      // do not match
  'y = 34; write("banana")',                       // match
  'z = "pineapple";',                              // match
  'p = "seed";'                                    // do not match
];

for (const re of regexs) {
  console.log('--------------: ', re.toString());
  for (const line of lines) {
    console.log(re.test(line).toString().padEnd(7), line);
  }
}

PS: I'm not worried about strings in comments or multiline strings or escaped quotes or single quotes. I just need to be able to easily browse 200k lines of code for all strings longer than a certain size but without certain keywords in the line at a glance.

PPS: I get that the first 2 would not work for 4th line, just trying to get some to work for the other 5 lines first on my way to handling the 4th line as well.

For a more concrete example, replace the 'printf' with 'localized' so I'm searching for all strings N characters or longer for lines that don't contain the word 'localized' to try to, at a glance, see which lines in the code still need localization. I don't need to find every string because in general they'll come in batches so just knowing where to look by seeing a few lines in a certain file will help find most cases. Lines have have already been localized contain the word 'localized'

samanthaj
  • 521
  • 3
  • 14
  • why `p = "seed";` do not match? It contains 11 characters. – namgold Sep 15 '20 at 04:05
  • Needs to be 5+ characters *in a string*. "seed" is 4 characters. – samanthaj Sep 15 '20 at 04:10
  • What's about `write("printf")` – namgold Sep 15 '20 at 04:27
  • Try `(?<!printf.*)("\w{5,}")(?!.*printf)` in the Find widget. – Mark Sep 15 '20 at 05:19
  • Interestingly it works in a single file search but [not multi-file search](https://github.com/microsoft/vscode/issues/100569). I used a different editor. Also it doesn't seem to work on the test code above. in any case thank you. – samanthaj Sep 15 '20 at 06:02
  • Yes, it works in the Find widget only - because it has a non-fixed length lookbehind. It sounded like you were working in one large file at a time - that isn't true? – Mark Sep 15 '20 at 21:45

2 Answers2

2

Try this: (?<!printf[^\n]*)"(?![^"\n]*printf)[^"\n]{5,}"(?![^\n]*printf)

Test on regex101.com

namgold
  • 1,009
  • 1
  • 11
  • 32
1
  • The regex first checks if the line does not contain printf: ^(?!.*printf)
  • skip as few strings as possible text"text": ([^"\n]*"[^"\n]*")*?
  • to find a string that has 5 or more characters: [^"\n]*"[^"\n]{5,}"
^(?!.*printf)([^"\n]*"[^"\n]*")*?[^"\n]*"[^"\n]{5,}"

See regex101


If you want to see the lines affected in the PROBLEMS panel of VSC you can use a task.

For windows I used the grep available in the git install.

    {
      "label": "Find to localize",
      "type": "shell",
      "windows": {
        "command": "\"C:\\Program Files\\Git\\usr\\bin\\grep\"",
      },
      "linux": {
        "command": "grep"
      },
      "args": [ "-nrP", "--file=${workspaceFolder}/.vscode/local5-grep.txt", "*" ],
      "options": { "cwd": "${workspaceFolder}" },
      "presentation": { "clear": true },
      "problemMatcher": {
        "owner": "localize",
        "fileLocation": ["relative", "${workspaceFolder}"],
        "pattern": [
          {
              "regexp": "^([^:]+):(\\d+):(.*)$",
              "file": 1,
              "line": 2,
              "message": 3
          }
        ]
      }
    }

Because the regular expression used contains a lot of " it is better to save it in a file.

I used .vscode/local5-grep.txt but you can use any file. Change the location in the task if needed.

The file .vscode/local5-grep.txt contains

^(?!.*printf)([^"\n]*"[^"\n]*")*?[^"\n]*"[^"\n]{5,}"

If only particular files are to be searched change the "*" argument in the task.

rioV8
  • 24,506
  • 3
  • 32
  • 49