3

I'm trying to write a regex to catch any use of private members in python, with the exception of function names.

For example, the following should return true:

a = __something__
b.__something()
__bla = 5
a[__bla__]
... etc etc

But the following should return false:

def __unicode__(self):
    ....

(because it has the "def" before it)

I wrote this expression:

regexp = re.compile(r'(?!def\s)[^a-zA-Z^_\s]__[a-zA-Z]')

And it works for most cases, but for some reason it always return false if there's a space before the private, eg this will not return true:

regexp.search("something = __private")

What am I doing wrong here? the "(?!def\s)" should not match if have "def " before it, and I handle spaces before the two underscores, eg inside "[^a-zA-Z^_\s]". so why isn't it working?

EDIT:

While the accepted answer is correct for regex, I recommend looking at Padraic Cunningham's answer for a better solution using ast. Thanks,

Ronen Ness
  • 9,923
  • 4
  • 33
  • 50

2 Answers2

2

You could try :

(?<!def\s)(\b__[a-zA-Z])

Example

source

Till
  • 4,183
  • 3
  • 16
  • 18
  • 1
    Good answer, but it still catches things like "a__stuff". I added "\b" before the '__' and now it works perfectly. Thanks :) – Ronen Ness Mar 27 '16 at 23:55
  • I think its a good practice for future seekers, but totally up to you. PS notice that with the \b it also catch it inside strings, but for my case its ok. – Ronen Ness Mar 28 '16 at 00:00
1

Using ast.NodeVisitor it is very easy to get the attributes and a lot more reliable than a regex:

import inspect
import importlib
import ast

class FindAttr(ast.NodeVisitor):
    def visit_Attribute(self, node):
        print(node.attr)


mod = "test"
mod = importlib.import_module(mod)
p = ast.parse(inspect.getsource(mod))

f = FindAttr()
f.visit(p)

test.py:

class Foo(object):
    def __init__(self):
        self.__foo = "foo"

    def meth1(self):
        self.bar = "bar"

    def meth2(self):
        self.__foobar = "foobar"


    def meth3(self):
        self.blah = "foobar"
        return self.blah

Output:

In [7]: mod = "test"

In [8]: mod = importlib.import_module(mod)

In [9]: p = ast.parse(inspect.getsource(mod))

In [10]: f = FindAttr()

In [11]: f.visit(p)
__foo
bar
__foobar
blah

All you need to so is check if node.attr.startswith("__") etc.. You can visit any nodes you like, FunctionDef, ClassDef like shown here, there is a comprehensive list of all the nodes in the greentreesnakes docs and their attributes.

Community
  • 1
  • 1
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • 1
    Your simple example convinced me to try out ast and it turned out great. It really is super easy to use and results are much more reliable. I already accepted the other answer (and also this question is about regex) but this is definitely a better solution for most cases. Thanks! – Ronen Ness Mar 29 '16 at 20:55
  • @Ness, no worries, yes it is really easy to use and more importantly a reliable way to parse. – Padraic Cunningham Mar 29 '16 at 23:21