1

I have a list of possible Python import statements and I need to parse them in JavaScript. I found this post regex to parse import statements in python and adopted it for JavaScript but for some reason, not all the statements are parsed.

Here is the test:

const re = /^(?:from[ ]+(\S+)[ ]+)?import[ ]+(\S+)(?:[ ]+as[ ]+\S+)?[ ]*$/g;

const lines = ['import numpy as np',
'import pandas as pd',
'import pkg.mod1, pkg.mod2',
'from pkg.mod2 import Bar as Qux',
'from abc.lmn import pqr',
'from abc.lmn import pqr as xyz',
'import mod',
'from mod import s, foo',
'from mod import *',
'from pkg.mod3 import *',
'from mod import s as string, a as alist',
'import re, json'];

for (var i = 0; i < lines.length; i++){
const res = re.exec(lines[i]);
console.log(res);
}

Ideally, the code would extract the names of packages that need to be loaded (not modules) but it's okay if it would work at least on all the examples.

Ideal expected result:

  • 'numpy',
  • 'pandas',
  • 'pkg',
  • 'pkg',
  • 'abc',
  • 'abc',
  • 'mod',
  • 'mod',
  • 'mod',
  • 'pkg',
  • 'mod'
  • ['re', 'json']
mimic
  • 4,897
  • 7
  • 54
  • 93

3 Answers3

1

You can use this regex:

/^(?:from\s+(\w+)(?:\.\w+)?\s+)?import\s+([^\s,.]+)(?:\.\w+)?/

RegEx Demo

Code:

const lines = ['import numpy as np',
'import pandas as pd',
'import pkg.mod1, pkg.mod2',
'from pkg.mod2 import Bar as Qux',
'from abc.lmn import pqr',
'from abc.lmn import pqr as xyz',
'import mod',
'from mod import s, foo',
'from mod import *',
'from pkg.mod3 import *',
'from mod import s as string, a as alist',
'import re, json'];

const re = /^(?:from\s+(\w+)(?:\.\w+)?\s+)?import\s+([^\s,.]+)(?:\.\w+)?((\s*,\s*\w+)*$)?/;

var results = []
lines.forEach(el => {
  var m = el.match(re);
  if (m)
    results.push(m[1] === undefined ? m[2] + (m[3] === undefined ? "" : m[3]) : m[1]);
});

console.log(results);
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

You can try , but don't match your last line since you edited aferward (updated at the end)

const re = /(import|from)\s+([^\s\.]+)/;

const lines = [
'import numpy as np',
'import pandas as pd',
'import pkg.mod1, pkg.mod2',
'from pkg.mod2 import Bar as Qux',
'from abc.lmn import pqr',
'from abc.lmn import pqr as xyz',
'import mod',
'from mod import s, foo',
'from mod import *',
'from pkg.mod3 import *',
'from mod import s as string, a as alist'
];

for (var i = 0; i < lines.length; i++){
    // console.log(lines[i]);
    const res = re.exec(lines[i]);
    console.log(res[2]);
}

More easy to explain that yours that did not work.

(import|from) : begin by import or from

\s+ : one or more space

[^\s.]+ : every characters not space and not dot

and beware of /g in a loop

Why does a RegExp with global flag give wrong results?

Update to match your last line

const re = /(import|from)\s+([^\.]+?[^,])(\s|\.|$)/;

Just the regex, I did not put the last in an array since you should know and you have already the other answer.

Dri372
  • 1,275
  • 3
  • 13
0

Would this be enough? /(?:import|from)\s+(\w+)/

GustavMH
  • 209
  • 2
  • 5