1

This loop is used in barcode scanning software. It may run as many times as a barcode is scanned, which is hundreds of times in an hour.

# locpats is a list of regular expression patterns of possible depot locations

for pat in locpats:
    q = re.match(pat, scannedcode)
    if q:
        print(q)
        return True

q is a Match object. The print(q) tells me that every match object gets its own little piece of memory. They'll add up. I have no idea to what amount in total.

I don't need the Match object anymore once inside the if. Should I wipe it, like so?

    q = re.match(pat, scannedcode)
    if q:
        q = None
        return True

Or is there a cleaner way? Should I bother at all?

If I understand right (from this), garbage collection with gc.collect() won't happen until a process terminates, which in my case is at the end of the day when the user is done scanning. Until that time, these objects won't be regarded as garbage, even.

Community
  • 1
  • 1
RolfBly
  • 3,612
  • 5
  • 32
  • 46
  • Are you running cPython? gc behavior depends on which flavor of python you're running. – roippi Feb 12 '14 at 15:17
  • 2
    `q = None` removes the reference in the local name (or rather, replaces it with a reference to the `None` object). It does absolutely nothing to the object. This distinction, along with many related ones, is of vital important if you want to reason about garbage collection and memory use. –  Feb 12 '14 at 15:19
  • 3
    Are you sure this is a problem at all? Have you observed your application slowing down or growing in size over the course of a day? For all we know, it allocates 10 KB an hour and isn't worth worrying about. – Kevin Feb 12 '14 at 15:19
  • You would get a bigger performance boost by [compiling](http://docs.python.org/2/library/re.html#re.compile) the regexp's. – Steinar Lima Feb 12 '14 at 15:31
  • "If I understand right (from this), garbage collection with gc.collect() won't happen until a process terminates" - I don't know how you got that impression from that link. Garbage collection happens all the time. As long as your objects are actually becoming garbage, you shouldn't have a problem. – user2357112 Feb 12 '14 at 15:31
  • @SteinarLima that only applies if he has more than 100 regex. Otherwise, the `re` module caches them. – roippi Feb 12 '14 at 15:32
  • It's Python 3.2. @Kevin No, I don't. That also why I asked 'should I bother at all'. I'm trying to learn what good practice is. – RolfBly Feb 12 '14 at 15:33
  • @user2357112 Thank you. I guess I should read up on how do object actually become garbage. If you have any suggestions for reading matter, I'd be much obliged. – RolfBly Feb 12 '14 at 15:37

3 Answers3

3

cPython uses reference counting (plus some cyclical reference detection, not applicable here) to handle gc of objects. Once an object reaches 0 extant references, it will be immediately gc'd.

In the case of your loop:

for pat in locpats:
    q = re.match(pat, scannedcode)

Each successive pat in locpats binds a new re.match object to q. This implies that the old re.match object has 0 remaining references, and will be immediately garbage collected. A similar situation applies when you return from your function.

This is all an implementation detail of cPython; other flavors of python will handle gc differently. In all cases, don't prematurely optimize. Unless you can pinpoint a specific reason to do so, leaving the gc alone is likely to be the most performant solution.

roippi
  • 25,533
  • 4
  • 48
  • 73
  • Thank you very much. Gee, Python is really beautiful, well-implemented. Reassuring! For readers like me who are new to cPython vs other implementations, [this is a nice summary](http://stackoverflow.com/questions/17130975/python-vs-cpython). – RolfBly Feb 12 '14 at 19:38
0

This is not a problem, since q is local, and therefore won't persist after you return.

If you want to make yourself feel better, you can try

if re.match(pat, scannedcode):
  return True

which will do what you're doing now without ever naming the match - but it won't change your memory footprint.

(I'm assuming that you don't care about the printed value at all, it's just diagnostic)

Jon Kiparsky
  • 7,499
  • 2
  • 23
  • 38
0

If your print statement is showing that each match is getting its own piece of memory then it looks like one of two things is happening:

1) As others have mentioned you are not using CPython as your interpreter and the interpreter you have chosen is doing something strange with garbage collection

2) There is code you haven't shown us here which is keeping a reference to the match object so that the GC code never frees it as the reference count to the match object never reaches zero

Is either of these the case?

Tommy
  • 622
  • 5
  • 8