0

A confession first - a noob programmer here doing occasional scripting. I've been trying to figure the memory consumption for this simple piece of code but unable to figure this out. I have tried searching in the answered questions, but couldn't figure it out. I'm fetching some json data using REST API, and the piece of code below ends up consuming a lot of RAM. I checked the Windows task manager and the memory consumption increases incrementally with each iteration of the loop. I'm overwriting the same variable for each API call, so I think the previous response variable should be overwritten.

while Flag == True:
        urlpart= 'data/device/statistics/approutestatsstatistics?scrollId='+varScrollId
        response = json.loads(obj1.get_request(urlpart))
        lstDataList = lstDataList + response['data']
        Flag = response['pageInfo']['hasMoreData']
        varScrollId = response['pageInfo']['scrollId']
        count += 1
        print("Fetched {} records out of {}".format(len(lstDataList), recordCount))
        print('Size of List is now {}'.format(str(sys.getsizeof(lstDataList))))
    return lstDataList

I tried to profile memory usage using memory_profiler...here's what it shows

    92  119.348 MiB    0.000 MiB       count = 0
    93  806.938 MiB    0.000 MiB       while Flag == True:
    94  806.938 MiB    0.000 MiB           urlpart= 'data/device/statistics/approutestatsstatistics?scrollId='+varScrollId
    95  807.559 MiB   30.293 MiB           response = json.loads(obj1.get_request(urlpart))
    96  806.859 MiB    0.000 MiB           print('Size of response within the loop  is {}'.format(sys.getsizeof(response)))
    97  806.938 MiB    1.070 MiB           lstDataList = lstDataList + response['data']
    98  806.938 MiB    0.000 MiB           Flag = response['pageInfo']['hasMoreData']
    99  806.938 MiB    0.000 MiB           varScrollId = response['pageInfo']['scrollId']
   100  806.938 MiB    0.000 MiB           count += 1
   101  806.938 MiB    0.000 MiB           print("Fetched {} records out of {}".format(len(lstDataList), recordCount))
   102  806.938 MiB    0.000 MiB           print('Size of List is now {}'.format(str(sys.getsizeof(lstDataList))))
   103                                 return lstDataList

obj1 is an object of Cisco's rest_api_lib class. Link to code here

In fact the program ends up consuming ~1.6 Gigs of RAM. The data I'm fetching has roughly 570K records. The API limits the records to 10K at a time, so the loop runs ~56 times. Line 95 of the code consumes ~30M of RAM as per the memory_profiler output. It's as if each iteration consumes 30M ending u with ~1.6G, so in the same ballpark. Unable to figure out why the memory consumption keeps on accumulating for the loop. Thanks.

Aditya
  • 1
  • 1
  • 1

2 Answers2

0

I would suspect it is the line lstDataList = lstDataList + response['data']

This is accumulating response['data'] over time. Also, your indentation seems off, should it be:

while Flag == True:
    urlpart= 'data/device/statistics/approutestatsstatistics?scrollId='+varScrollId
    response = json.loads(obj1.get_request(urlpart))
    lstDataList = lstDataList + response['data']
    Flag = response['pageInfo']['hasMoreData']
    varScrollId = response['pageInfo']['scrollId']
    count += 1
    print("Fetched {} records out of {}".format(len(lstDataList), recordCount))
    print('Size of List is now {}'.format(str(sys.getsizeof(lstDataList))))
return lstDataList

As far as I can tell, lstDataList will keep growing with each request, leading to the memory increase. Hope that helps, Happy Friday!

Sam
  • 1,406
  • 1
  • 8
  • 11
  • Thank you Sam. Yes, lstDataList is supposed to store all records. Each loop fetches 10K records and the loop keeps accumulating the records in the list till the 'hasMoreData' flag turns false. But I also keep printing the size of the list in each loop (line 102 in the code excerpt), and the size of the list after the last iteration is ~4M. – Aditya Jun 02 '20 at 12:03
  • 1
    So... yes, you need to free the previous block of records if you want to reclaim memory! As others have said, you can use `del` to delete objects. – Sam Jun 02 '20 at 15:44
0

it's as if each iteration consumes 30M

That is exactly what is happening. You need to free memory that you dont need for example once you have extracted data from response. You can delete it like so

del response

more on del

more on garbage collection

W-B
  • 850
  • 5
  • 16
  • Thanks, I had tried this...didn't put it as part of the question as it didn't help. For my understanding, wouldn't response be reassigned in each iteration. Would it keep on accumulating memory then? – Aditya Jun 02 '20 at 11:50
  • Yes as lstDataList keeps growing, you are bound to keep accumulating memory. This slows down the rate of memory usage but the underlying problem is that you have unchecked growth of your list. That data needs to be saved somewhere or you need to parse out what you don't need. – W-B Jun 02 '20 at 14:08