0

I have a python file, which looks something like this:

class Hello():
    something = 0
    someotherthing = 2

class Heythere():
    whatsthis()
    def whatsthis():
        dosomething=0

class Anotherclass():
    imavar=2
    whatsup='?'

....

And it continues like this for some time, there are a lot of classes. I want to capture each class into a list using a regular expression. I always want the regex to start capturing the strings at "class" and always want it to stop where there are two line breaks in a row. Here is what I tried, and got nowhere. I am not familiar with regular expression syntax at all so maybe I am doing things completely wrong:

import re

r = open('python.py','r').read()
x = re.findall(r'(class?)\n\n', r)

x always returns an empty list []

Not sure where I am doing this wrong, but I am fairly certain my syntax is completely off. I just... don't know where to start

Titus P
  • 959
  • 1
  • 7
  • 16
  • 1
    why don't you start with `open(...).read().split('\n\n')` then the regex will be much easier :-) – thebjorn Feb 12 '14 at 19:50
  • 6
    I would *strongly* suggest not parsing the code with regex. If you need a list of classes in a module look into the `inspect` module. Also, this answer: http://stackoverflow.com/questions/1796180/python-get-list-of-all-classes-within-current-module. – g.d.d.c Feb 12 '14 at 19:52
  • Well both of those answers are way better than what I was thinking :) Thanks guys! – Titus P Feb 12 '14 at 19:56
  • 1
    Also, the `ast` module can parse Python code without running it. – user2357112 Feb 12 '14 at 20:14

2 Answers2

1

this regex will capture your groups

((?:.*\n){1,5}.*)\n\n

demo here : http://rubular.com/r/MBLLb2m8WG

aelor
  • 10,892
  • 3
  • 32
  • 48
0

is this anything like what you want?

import re
r = open('python.py','r').read()
x = re.findall(r'class .+', r)
steffffffff
  • 75
  • 1
  • 2
  • 6
  • A context manager would be appropriate here. `with open('python.py','r') as fin: r = fin.read()` – GVH Feb 12 '14 at 21:06