-1

I am looking for a regex that can match any line of code that contains a single reference to a core module)

Something like this:

const coreModuleMatches = /'^[var|const]{0,1}[a-z\$\_]{1,}=require([\'|"][assert|fs|path][\'|"])[;|,]{0,1}$/;

This should match all of these lines

var pth = require("path"); 
const asrt = require('assert'),
     fs = require('fs'),
     cp = require('child_process');

The problem is I can't get the simple regex to work, so my more complex regex currently has no hope.

I am stripping out all whitespace except newline characters before matching the code with regular expressions, and then splitting by newline so that I can go line by line through the code. Any ideas welcome.

Alexander Mills
  • 90,741
  • 139
  • 482
  • 817
  • 4
    You forgot to escape `(` and `)`. –  Mar 20 '16 at 09:00
  • Is string from `var. . . . . ('child_process');` one match or two different matches ? –  Mar 20 '16 at 10:03
  • http://stackoverflow.com/questions/399078/what-special-characters-must-be-escaped-in-regular-expressions and http://stackoverflow.com/questions/9801630/what-is-the-difference-between-square-brackets-and-parentheses-in-a-regex – Wiktor Stribiżew Mar 20 '16 at 11:03
  • var. . . . . ('child_process'); is one match, and the subexpression I really care about is simply 'child_process', I just want an array of core-modules that are referenced in the code. – Alexander Mills Mar 20 '16 at 18:50

2 Answers2

1

Apart from forgetting to escape ( and ) there were few mistakes in your regex too.

Your Regex:

/'^[var|const]{0,1}[a-z\$\_]{1,}=require([\'|"][assert|fs|path][\'|"])[;|,]{0,1}$/

My Regex:

/^(?:var|const)\s*([a-z$_]+\s*=\s*require\(('|")(?:assert|fs|path|child_process)\2\),?[\n\r\t\s]*)*;$/

Explanation:

  • (?:var|const)\s*([a-z$_]+\s*=\s* This matches from var or const upto the variable name followed by = including all whitespaces.

  • require\(('|")(?:assert|fs|path|child_process)\2\),? This matches the require() and whatever the module is inside it. As the first quote is captured using ('|"), \2 implies that it's the one repeated while closing too, so that mismatching of quotes does not takes place.

  • [\n\r\t\s]*)*; This matches all the whitespaces in your second variable consisting of newlines, tabs, spaces, carriage returns.

Regex101 Demo

  • thanks, I actually don't need to match whitespace. If you could simplify your regex by removing whitespace matches and explain how yours works that would help! – Alexander Mills Mar 20 '16 at 18:35
  • @AlexMills: Which whitespaces would you like to be removed ? In your question the match from `const.....` till end have many whitespaces. Besides that you haven't answered to my comment on your question. –  Mar 20 '16 at 18:51
  • yeah, don't worry about that, it still works with the optional whitespace...I don't know what the '\2\' is doing. Your answer is unlikely to help anyone unless you can explain the regex a bit, thanks – Alexander Mills Mar 20 '16 at 18:52
  • @AlexMills: I have updated the answer with explanation. Let me know if it's clear or not. –  Mar 20 '16 at 19:08
1

You don't need a regex so complicated, something like /("|')(assert|fs|http)\1/ should be enough:

// don't list all modules by hand
var builtinModules = require('builtin-modules');

console.time('end');
var input =
`var pth = require("path");
var _ = require('lodash');
const asrt = require('assert'),
fs = require('fs'),
cp = require('child_process');`.split('\n');

// \1 is a reference to the matched beginning (double)quote
// to prevent something like "path'/'fs" to match
var rgxStr = `("|')(${ builtinModules.join('|') })\\1`;
var rgx = new RegExp(rgxStr);
// console.log(rgxStr); // uncomment to see how the Regex looks like

var output = input.filter((line) => line.match(rgx));

console.timeEnd('end');
console.log('input');
console.log(input);
console.log('--------------------------------------');
console.log('output');
console.log(output);

Output:

end: 0.428ms
input
[ 'var pth = require("path");',
  'var _ = require(\'lodash\');',
  'const asrt = require(\'assert\'),',
  'fs = require(\'fs\'),',
  'cp = require(\'child_process\');' ]
--------------------------------------
output
[ 'var pth = require("path");',
  'const asrt = require(\'assert\'),',
  'fs = require(\'fs\'),',
  'cp = require(\'child_process\');' ]
Shanoor
  • 13,344
  • 2
  • 29
  • 40
  • You don't want to mismatch **quotes**. Use back-reference to captured groups. –  Mar 20 '16 at 10:04
  • I get this output, given this input: – Alexander Mills Mar 20 '16 at 18:48
  • input: var assert = require('assert'); const fs = require('fs'); – Alexander Mills Mar 20 '16 at 18:48
  • output: core module match: 'assert' core module match: ' core module match: assert core module match: 'fs' core module match: ' core module match: fs – Alexander Mills Mar 20 '16 at 18:48
  • so this regex is matching this literal: ' ... and also, it's matching 'assert' or assert , etc. – Alexander Mills Mar 20 '16 at 18:49
  • I have to capture '/" to check if the closing quote is the same as the opening, you could just grab match[1] instead of logging the entire array. Or, if you're sure the input is valid JS, you can use this regex: `var rgxStr = \`(?:"|')(${ builtinModules.join('|') })(?:"|')\`;`. And no, it doesn't match `assert` alone, only if it's between quotes. – Shanoor Mar 21 '16 at 13:34