0

I currently am developing a website in the Symfony2 framework, and i have written a Command that is run every 5 minutes that needs to read a tonne of RSS news feeds, get new items from it and put them into our database.

Now at the moment the command takes about 45 seconds to run, and during those 45 seconds it also takes about 50% to up to 90% of the CPU, even though i have already optimized it a lot.

So my question is, would it be a good idea to rewrite the same command in something else, for example python? Are the RSS/Atom libraries available for python faster and more optimized than the ones available for PHP?

Thanks in advance, Jaap

Carlos Granados
  • 11,273
  • 1
  • 38
  • 44
jaapz
  • 989
  • 8
  • 26

3 Answers3

2

You can parse raw XML using lxml which users underlying libxml C iibrary:

http://lxml.de/parsing.html

Because parsing is done using native code it's fast.

Someone is already doing in:

Encoding error while parsing RSS with lxml

On the other hand if the bottleneck is not XML parsing, but downloading data and sorting it out, then the bottleneck is somewhere else.

Community
  • 1
  • 1
Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
0

You could try to check Cache-Headers of the feeds first before parsing them.
This way you can save the expensive parsing operations on probably a lot of feeds.

Store a last_updated date in your db for the source and then check against possible cache headers. There are several, so see what fits best or is served the most or check against all.
Headers could be:

  • Expires
  • Last-Modified
  • Cache-Control
  • Pragma
  • ETag

But beware: you have to trust your feed sources.
Not every feed provides such headers or provides them correctly.
But i am sure a lot of them do.

ivoba
  • 5,780
  • 5
  • 48
  • 55
0

Is solved this by adding a usleep() function at the end of each iteration of a feed. This drastically lowered cpu and memory consumption. The process used to take about 20 minutes, and now only takes around and about 5!

jaapz
  • 989
  • 8
  • 26