0

How do you create a function to get the word after "class" only if it is not a quoted (by single or double, or any triple quotations) and if its spelled correctly (cannot get class d() )

"class hi()"  > hi

"class hi(dff)"  > hi

"class hi   (  dff  )  :"  > hi 

"  class        hi       (  dff  )  :"  > hi 

"class hi"  > hi

"classf hi"  > Nothing

"fclass hi"  > Nothing

"'class hi(dd)'"  > Nothing

'"class hi(dd)"'  > Nothing

"'''class hi(dd)'''"  > Nothing

'"""class hi(dd)"""'  > Nothing

'"""\n\n\n\nclass hi(dd)\n\n\n\n"""'  > Nothing    

"'class' hi()"  > Nothing

It is too hard to create using loops. If anyone can help that would be nice, thanks. This is pretty challenging,

herbertD
  • 10,657
  • 13
  • 50
  • 77
user1357159
  • 309
  • 5
  • 19
  • try this: `re.search(r'\bclass\s+(\w+)',line)` – the wolf Jun 09 '12 at 19:59
  • what is the source for the strings? If it is a Python source code then you could use `ast` module to extract all class names (if `tokenize` module doesn't work for you) e.g., http://stackoverflow.com/questions/585529/find-all-strings-in-python-code-files – jfs Jun 09 '12 at 20:01
  • Hmm some of the test cases dont work with that one, carrot – user1357159 Jun 09 '12 at 20:01
  • The source of the strings, is just a normal string, not a file – user1357159 Jun 09 '12 at 20:01

4 Answers4

4

Something like this, maybe?

from StringIO import StringIO
from tokenize import generate_tokens
from token import NAME

def classname(s):
    g = generate_tokens(StringIO(s).readline)   # tokenize the string
    it = iter(g)
    for toknum, tokval, _, _, _  in it:
        if (toknum == NAME and tokval == 'class'):
            return it.next()[1]

print classname("class hi(29):")
ffao
  • 856
  • 4
  • 7
3
import re

def remove(reg, s, multiline=False):
    flags = [re.M, re.M | re.DOTALL][multiline]
    s,num = re.subn(reg, "", s, flags=flags)
    return s

def classname(s):
    s = remove("\"\"\".*?\"\"\"", s, multiline=True)
    s = remove("\'\'\'.*?\'\'\'", s, multiline=True)
    s = remove("\".*?\"", s)
    s = remove("\'.*?\'", s)

    res = re.search("(^|\s)class\s+(\w+)", s, flags=re.M)
    # print "*** {} -> {}".format(s, res.groups() if res else None)
    if res is None:
        return None
    else:
        return res.group(2)

I wanted to use \b instead of (^|\s) but it didn't seem to want to work?

I also put together the following test code:

tests = [
    ("class hi()", "hi"),
    ("class hi(dff)", "hi"),
    ("class hi   (  dff  )  :", "hi"),
    ("  class        hi       (  dff  )  :", "hi"),
    ("class hi", "hi"),
    ("classf hi", None),
    ("fclass hi", None),
    ("'class hi(dd)'", None),
    ('"class hi(dd)"', None),
    ("'''class hi(dd)'''", None),
    ('"""class hi(dd)"""', None),
    ('"""\n\n\n\nclass hi(dd)\n\n\n\n"""', None),   
    ("'class' hi()", None),
    ("a = ''; class hi(object): pass", "hi")
]

def run_tests(fn, tests=tests):
    for inp,outp in tests:
        res = fn(inp)
        if res == outp:
            print("passed")
        else:
            print("FAILED on {} (gave '{}', should be '{}')".format(inp, repr(res), repr(outp)))
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
2

Use regular expressions:

pattern = re.compile(r"\s*class\s+(\w+)")

For example:

>>> line_to_test = "  class        hi       (  dff  )  :" 
>>> match = pattern.match(line_to_test)
>>> match
<org.python.modules.sre.MatchObject object at 0x3>
>>> match.groups()
('hi',)
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
  • you need to remove quoted strings => then regular expression will not be so easy. – ddzialak Jun 09 '12 at 19:46
  • @ddzialak: I don't get your meaning. OP doesn't want to match quoted strings. – Joel Cornett Jun 09 '12 at 19:49
  • This is working good for most of the test cases, but for "classf hi" your method gives "f", it should give nothing – user1357159 Jun 09 '12 at 19:53
  • @user1357159: Fixed. I changed the second `*` to a `+` above. Should work fine now. – Joel Cornett Jun 09 '12 at 20:01
  • I've tested both this and ffao's answer with the time module and this one seems to go faster. – user1357159 Jun 09 '12 at 20:13
  • 1
    @user1357159: try it on "a = ''; class D(object): pass". ffao's is a better solution than regular expressions. Joel Cornett's solution takes advantage of a missing test case in your given test data (a case *following* a string). – Hugh Bothwell Jun 09 '12 at 20:41
0
  1. Remove all substrings that are enclosed in quotes (i.e., ', " and ''' or """).
  2. Use regular expressions to match the expression "class (name of class here)".

You may need to tweak the regular expression to properly match all valid Python identifiers for class names:

import re
m = re.match("class ([\w]+)", "class hi")
print m.group(0)
Simeon Visser
  • 118,920
  • 18
  • 185
  • 180