1

I have a page what refresh periodically using this script:

from twisted.web.client import getPage
from twisted.internet import reactor, task

def getData():
    dgp = getPage('http://www.google.com/')
    dgp.addCallback(dataLoadOK)
    dgp.addErrback(dataLoadError)

def dataLoadOK(value):
    print value

def dataLoadError(error):
    print error

loop = task.LoopingCall(getData)
loop.start(10, now=True)
reactor.run()

Buy while using this way I got the memory leak. Has any one help me find there is it?

Edit: I have try to use garbage collection python module, and got this out put:

GARBAGE OBJECTS:
:: <HTTPClientFactory: http://www.google.com/>
        type: <type 'instance'>
referrers: 3
    is class: True
    module: <module 'twisted.web.client' from '/usr/lib/python2.7/site-packages/twisted/web/client.pyc'>

:: {'status': '200', 'cookies': {'PREF': 'ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI', 'NID': '47=LxM9fbBBN-bVIeuLPOfvO-fgXOKw1n2suyZ2...
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: InsensitiveDict({})
        type: <type 'instance'>
referrers: 3
    is class: True
    module: <module 'twisted.python.util' from '/usr/lib/python2.7/site-packages/twisted/python/util.pyc'>

:: {'preserve': 1, 'data': {}}
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: <Deferred at 0x29e2cf8 current result: None>
        type: <type 'instance'>
referrers: 3
    is class: True
    module: <module 'twisted.internet.defer' from '/usr/lib/python2.7/site-packages/twisted/internet/defer.pyc'>

:: {'_chainedTo': None, 'called': True, '_canceller': None, 'callbacks': [], 'result': None, '_runningCallbacks': False}
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: <<class 'twisted.internet.tcp.Client'> to ('www.google.com', 80) at 2445090>
        type: <class 'twisted.internet.tcp.Client'>
referrers: 3
    is class: True
    module: <module 'twisted.internet.tcp' from '/usr/lib/python2.7/site-packages/twisted/internet/tcp.pyc'>
    line num: 681
        line: class Client(BaseClient):
        line:     """A TCP client."""
        line: 
        line:     def __init__(self, host, port, bindAddress, connector, reactor=None):
        line:         # BaseClient.__init__ is invoked later
        line:         self.connector = connector
        line:         self.addr = (host, port)
        line: 
        line:         whenDone = self.resolveAddress
        line:         err = None
        line:         skt = None
        line: 
        line:         try:
        line:             skt = self.createInternetSocket()
        line:         except socket.error, se:
        line:             err = error.ConnectBindError(se[0], se[1])
        line:             whenDone = None
        line:         if whenDone and bindAddress is not None:
        line:             try:
        line:                 skt.bind(bindAddress)
        line:             except socket.error, se:
        line:                 err = error.ConnectBindError(se[0], se[1])
        line:                 whenDone = None
        line:         self._finishInit(whenDone, skt, err, reactor)
        line: 
        line:     def getHost(self):
        line:         """Returns an IPv4Address.
        line: 
        line:         This indicates the address from which I am connecting.
        line:         """
        line:         return address.IPv4Address('TCP', *(self.socket.getsockname() + ('INET',)))
        line: 
        line:     def getPeer(self):
        line:         """Returns an IPv4Address.
        line: 
        line:         This indicates the address that I am connected to.
        line:         """
        line:         return address.IPv4Address('TCP', *(self.realAddress + ('INET',)))
        line: 
        line:     def __repr__(self):
        line:         s = '<%s to %s at %x>' % (self.__class__, self.addr, unsignedID(self))
        line:         return s

:: {'_tempDataBuffer': [], 'disconnected': 1, 'dataBuffer': '', '_tempDataLen': 0, 'realAddress': ('74.125.225.81', 80), 'connector': <twisted.internet.tcp.Connect...
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: []
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: {'x-xss-protection': ['1; mode=block'], 'set-cookie': ['PREF=ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI; expires=Tue, 21-May-2013 0...
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: ['-1']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['private, max-age=0']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['text/html; charset=ISO-8859-1']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['PREF=ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI; expires=Tue, 21-May-2013 08:34:12 GMT; path=/; domain=.google.com', 'NID=47=LxM9...
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['gws']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['1; mode=block']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: []
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: <twisted.internet.tcp.Connector instance at 0x29e2cb0>
        type: <type 'instance'>
referrers: 3
    is class: True
    module: <module 'twisted.internet.tcp' from '/usr/lib/python2.7/site-packages/twisted/internet/tcp.pyc'>

:: ['Sun, 22 May 2011 08:34:12 GMT']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: {'reactor': <twisted.internet.selectreactor.SelectReactor object at 0x288bd10>, 'state': 'disconnected', 'factoryStarted': 0, 'bindAddress': None, 'factory': <H...
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

so I saw some unclosed reference inside twisted function, how can I avoid it?

BGE
  • 135
  • 5

2 Answers2

3

Try some strategies recommended in related questions. However, it is likely that you don't have a memory leak, you just have memory fragmentation.

It looks like the "Python memory leak detector" has a pretty severe bug. It enables DEBUG_LEAK which prevents all cycles from being collected. Put another way, it creates lots of massive leaks. If you just add some code to your example to report the contents of gc.garbage without enabling DEBUG_LEAK, then it remains empty (gc.garbage will be populated if any objects are actually leaking, even if you don't enable any gc debug flags).

Community
  • 1
  • 1
Jean-Paul Calderone
  • 47,755
  • 6
  • 94
  • 122
2

The way you're scheduling your looping call may be a problem. You're not returning the Deferred from getData, so calls may pile up.

If retrieving your web page takes longer than 10 seconds, then it will call the second getData before the second getData completes. If you're using a website which attempts to throttle you (and google.com definitely does), then the more requests that pile up, the more it will delay you. Each attempt will take up some memory, which may look like a leak.

If that's the problem (although you should use the techniques that Jean-Paul suggests to discover if that's actually the problem), then you can address it by adding "return dgp" to the end of your getData function.

Glyph
  • 31,152
  • 11
  • 87
  • 129
  • Actually in production script, interval is 300 second and more than any timeouts, and I check for prevision `getData()` call complete, this script is simplified for better reading – BGE May 22 '11 at 08:53