5

Hello to All

I have a PHP website that should use some cached data (stored in Memcache, for example). The data should be stored in cache by daemons fetching it from web services and some of it should be stored in MySQL server too.

The daemons should do the following:

  1. Fetch foreign exchange rates, parse them and store them in database as well as in two seperated memcaches in seperate machines.
  2. Fetch financial indices and store it in seperated memcaches.
  3. Fetch large XML data and store it in two seperated memcaches.

I am capable of writing these daemons in C/C++/Perl/PHP/Python.

I have to decide in which language/script I should choose in order to implement these daemons. The advantage of using PHP for this is that I can use API used by the website application itself. Another advantage is that PHP is easy and everyone knows it so I won't be tied up to maintaining these daemons but on the other hand PHP is slower and consumes much more resources.

The main disadvantage of using other language than PHP is that it's harder to maintain code written in C/C++/Perl. Nowadays, I guess it's not common to do these kind of tasks using C/C++/Perl. Am I wrong in saying that ?

What would you recommend me to do in this case ?

Yossi
  • 539
  • 9
  • 20
  • 3
    You're saying that it's easier to maintain php code than c or c++ code? – Falmarri Jan 08 '11 at 20:45
  • 1
    So, what's your question? Are you looking for people to dissuade you from using PHP? – chrisaycock Jan 08 '11 at 20:46
  • Usually, as it concerns with medium applications, it's true. And i speak on behalf my experience only ;) – Yossi Jan 08 '11 at 20:47
  • I'm asking what it the best practice to do the job ... – Yossi Jan 08 '11 at 20:48
  • Is there a reason to choose daemons over cron jobs ? It is harder to program/maintain/deploy/ensure correctness of daemons than CJ and most of the cases CJ are more than enough to solve a problem. – clyfe Jan 08 '11 at 20:51
  • 1
    on your system almost(if not all) your daemons are written in c/c++ – cristian Jan 08 '11 at 20:51
  • the daemons i should implement should do the job every 30 seconds - by definition. so i suspect CJ won't be suitable for this... – Yossi Jan 08 '11 at 20:54
  • 1
    "The main disadvantage of using other language than PHP is that it's harder to maintain code written in C/C++/Perl. Nowadays, I guess it's not common to do these kind of tasks using C/C++/Perl. Am I wrong in saying that ?". Yes, quite wrong, but it sounds like you've already decided that you are going to write in PHP and are looking for validation of your decision. – the Tin Man Jan 09 '11 at 02:30

5 Answers5

4

The best choice would probably be PHP for simplicity/code reuse.

PEAR System Daemon
Create daemons in php

EDIT
From what I can tell it's just passing data around, it's no performance to worry about. And about resource usage just make sure not to run out of max_memory (by means of streaming maybe or configure plenty). Abort and log operations that take too long. Reconnect to the database in a loop when SQL operation fail etc.

NOTE OF CAUTION
Daemon programming is tricky and a lot of things can go wrong. Take into considerations all points of failure.

Also, note that Perl is a lot more versed in regards to daemons than PHP. I left out c/c++ as performance (pass data around) is not an issue and daemon programming is hard enough as it it, why add worries on memory leaks, segfaults etc. ?

clyfe
  • 23,695
  • 8
  • 85
  • 109
  • you can install a command line interface only version of PHP to minimize footprint. you can also set up shell ini script to make sure the daemons load up after a system restart. – dqhendricks Jan 08 '11 at 21:06
  • clyfe: what do you mean by saying: "Abort and log operations that take too long" ? i didn't understand ... – Yossi Jan 08 '11 at 21:28
  • The idea is to keep up on the proposed 30 seconds loop. When you read a service sometimes might take too long to respond (say, it has too many simultaneous requests) or stops responding at all, and these situation must be taken into account, because the slightest error can bring your daemon down. Also stuff like [Monit](http://mmonit.com/monit/) is invaluable helpful. – clyfe Jan 08 '11 at 21:33
  • But i am still disturbed from the notion that PHP is not the natural choice when you want to write daemons. From some reason, I am not feel too confident to do this when it comes to daemons... I know that it doesn't sound too much professional or supported by facts, but never the less, do you understand what I am saying ? – Yossi Jan 08 '11 at 22:15
  • Personally, I would choose Ruby (both web and daemon). So, on the zen side I cannot comfort your fears. Indeed PHP is not a natural choice, but try to do a mental mapping to see if it covers your needs (and it probably does). If there are other people doing daemons in PHP there is not much to worry about is there? – clyfe Jan 09 '11 at 10:50
4

Perl and Python are default answers for writing such scripts. But it doesn't matter (much) what language you use if you write good code. The more importat thing is that how you handle your script on failure.

In the long run you may see your scripts are failing seldom for arbitrary reasons, and it may not worth for you to debug the script because it usually does a fair job and it would be difficult to find where it went wrong.

I have few perl scripts doing the same kind of thing that you are doing. to me the tricky part was to make sure that my scripts don't fail for long because I didn't want to miss a chunck of live streamed data.

And for that I used monit . A great tool.

Nylon Smile
  • 8,990
  • 1
  • 25
  • 34
3

The best practice is to use whatever technology you know the best. You will:

  • implement the solution faster
  • be better able to debug problems you run into
  • more easily evaluate libs (or even know about them) that can offload some of the work for you
  • have an easier time maintaining and extending the code

Realistically, speed and resource usage are going to be relatively unimportant unless you actually have real performance requirements.

dietbuddha
  • 8,556
  • 1
  • 30
  • 34
2

short: I would use Python.

bigger: I've tried PHP in cli mode, I experienced a lot of memory leaks, certainly because of bad PHP libs, or PHP libs which have never been though for another thing than fast die in a web-request mode (I'm suscpicious on PDO for example).

In the python world I've seen recently portion of code from shinken, it's a nice nagios rewrite as python daemons, very clever. See http://www.shinken-monitoring.org/the-global-architecture/ & http://www.shinken-monitoring.org/wiki/official/development-hackingcode . As it's a monitoring tool you can certainly find there some very good ideas for some daemons repeting tasks.

Now, can I make a proposition? Why not using Shinken or Centreon as the scheduler for data fetching tasks? (And maybe soon Centreon with a shinken engine instead of nagios engine, I hope)? This could be useful to detect changes in external data, issue in fetchs, etc.

Then for the tasks that should be done (fetch data, transform data, store data, etc) this is the job of an ETL. One nice open source tool is Talend ETL (Java). There're some scheduling and monitoring tools for Talend but not Open source (sort-of-open-source-where-you-must-pay-a-license). But adding an external scheduler like Nagios for tasks should be easy (I hope). You'll need to check that memcached is available as a storage engine for talend ETL or code your plugin.

So, this to say than instead of the language you should maybe think about the tools. Or not, depending on the complexity you can assume, each tool add his own complexity. However if you want to rebuild all from scratch python is fast an efficient.

regilero
  • 29,806
  • 6
  • 60
  • 99
0

You should use the same language that the rest of your application is written in. That way you can reuse code and developer skills more easily.

However, as others have noted, PHP is bad for long-running daemons because it handles memory in a way which is liable to leak.

So I would run these tasks in a "cron" job which was periodically (re-) started, but make sure you don't run more copies of the tasks than you intend.

Cron jobs are more robust than daemons.

  • A cron job which fails and quits will start again next time it is scheduled
  • A cron job which contains memory leaks will release its memory when it ends its run anyway
  • A cron job which has its software upated (libraries etc) automatically picks up the new versions on the subsequent run without any special effort.
  • "cron" already provides startup/shutdown scripts which your Ops team can use to control it, so you don't need to rewrite these. Your Ops team already know how to operate "cron", and know how to comment out crontab entries if they want to temporarily disable it.
MarkR
  • 62,604
  • 14
  • 116
  • 151