-2

I have a Python program which is running in a loop and downloading 20k RSS feeds using feedparser and inserting feed data into RDBMS.

I have observed that it starts from 20-30 feeds a min and gradually slows down. After couple of hours it comes down to 4-5 feeds an hour. If I kill the program and restart from where it left, again the throughput is 20-30 feeds a min.

It certainly is not MySQL which is slowing down.

What could be potential issues with the program?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Peter
  • 111
  • 2
  • 9
  • 3
    Without seeing code it is almost impossible to answer. – Padraic Cunningham Feb 23 '16 at 13:20
  • @PadraicCunningham I disagree. It is possible to answer to a good extent. In all likelihood he is having a memory related issue. The code would help of course. – Sid Feb 23 '16 at 13:22
  • @Sid, I disagree, it is possible to throw many darts and see what sticks like your answer is doing but without more information that is all anyone will be doing. – Padraic Cunningham Feb 23 '16 at 13:24
  • 1
    @Sid you need to clarify a few things before launching into guessing at an answer. – Peter Wood Feb 23 '16 at 13:24
  • 1
    For everyone who opposed to Sid's answer - His answer helped tremendously and I was able to solve problem based on that. Please understand sometime people work with constraint and they can's share code. If you think you cant answer - at least don't demotivate others please. – Peter Feb 26 '16 at 03:45

1 Answers1

0

In all likelihood the issue is to do with memory. You are probably holding the feeds in memory or somehow accumulating memory that isn't getting garbage collected. To diagnose:

  1. Look at the size of your task (task manager if windows and top if unix/Linux) and monitor it as it grows with the feeds.
  2. Then you can use a memory profiler to figure what exactly is consuming the memory
  3. Once you have found that you can code differently maybe

A few tips:

  1. Do an explicit garbage collection call (gc.collect()) after setting any relevant unused data structures to empty
  2. Use a multiprocessing scheme where you spawn multiple processes that each handle a smaller number of feeds
  3. Maybe go on a 64 bit system if you are using a 32 bit

Some suggestions on memory profiler:

  1. https://pypi.python.org/pypi/memory_profiler This one is quite good and the decorators are helpful
  2. https://stackoverflow.com/a/110826/559095
Community
  • 1
  • 1
Sid
  • 7,511
  • 2
  • 28
  • 41
  • Thanks Sid - this is very helpful. I did see memory footprint going up in the task manager. Any suggestions on memory profiler to use? Also you mentioned about multiprocessing scheme - any suggestions there? – Peter Feb 23 '16 at 13:30
  • Updated answer with a couple of suggestions – Sid Feb 23 '16 at 14:03