7

can someone help me writing single regex to get module(s) from python source line?

from abc.lmn import pqr
from abc.lmn import pqr as xyz
import abc
import abc as xyz

it has 3 sub parts in it

[from(\s)<module>(\s)] --> get module if this part exist
import(\s)<module>     --> get module
[(\s)as(\s)<alias>]    --> ignore if this part exist

something like this

:?[from(\s)<module>(\s)]import(\s)<module>:?[(\s)as(\s)<alias>]
Murali Mopuru
  • 6,086
  • 5
  • 33
  • 51

2 Answers2

23

Instead of using a regex, using the built in python library ast might be a better approach. https://docs.python.org/2/library/ast.html You can use it to parse python syntax.

import ast

import_string = """from abc.lmn import pqr
from abc.lmn import pqr as xyz
import abc
import abc as xyz"""

modules = []
for node in ast.iter_child_nodes(ast.parse(import_string)):
    if isinstance(node, ast.ImportFrom):
        if not node.names[0].asname:  # excluding the 'as' part of import
            modules.append(node.module)
    elif isinstance(node, ast.Import): # excluding the 'as' part of import
        if not node.names[0].asname:
            modules.append(node.names[0].name)

that will give you ['abc.lmn', 'abc'] and it is fairly easy to tweak if you want to pull other information.

Micky Loo
  • 1,433
  • 9
  • 7
7

Looks like you could make the from optional and the import required at
the same time ignoring the as.

(?m)^(?:from[ ]+(\S+)[ ]+)?import[ ]+(\S+)[ ]*$

https://regex101.com/r/fmoAuh/1

Explained

 (?m)                          # Modifiers: multi-line
 ^                             # Beginning of line
 (?:                           # Optional from
      from [ ]+ 
      ( \S+ )                       # (1), from <module>
      [ ]+ 
 )?

 import [ ]+                   # Required import
 ( \S+ )                       # (2), import <module>
 [ ]* 
 $                             # End of line

Or, if you want to match the as but do not want to capture anything, use this.

(?m)^(?:from[ ]+(\S+)[ ]+)?import[ ]+(\S+)(?:[ ]+as[ ]+\S+)?[ ]*$

https://regex101.com/r/xFtey5/1

Expanded

 (?m)                          # Modifiers: multi-line
 ^                             # Beginning of line
 (?:                           # Optional from
      from [ ]+ 
      ( \S+ )                       # (1), from <module>
      [ ]+ 
 )?

 import [ ]+                   # Required import
 ( \S+ )                       # (2), import <module>

 (?:                           # Optional as
      [ ]+ 
      as [ ]+ 
      \S+                          # <alias>
 )?
 [ ]* 
 $