3

I need to design a function which uses hashtable. It basically inserts data into hashtable and search for items. Typically the function will take 15sec to 10min for execution. Should I implement this function in c++ and use a system call in PHP or should I implement it in php using associative arrays. Which would be more efficient. What are the advantage and disadvantage of each.

The key will be a string. The value will be one structure which contains two other structures. The first structure basically contains an array of integers and the second will contain an array of integer pair values

knightrider
  • 2,063
  • 1
  • 16
  • 29
  • 1
    Try it out. It depends on how you implement it in both languages, how much data you have, how much RAM, key spread, ... – Mat Jul 15 '12 at 14:11
  • 2
    Um. if he's talking about using C, why is this question tagged c++ ? Btw, if you WAS to use c++ , c++ 11 comes with hash tables, namely unordered_map, unordered_set, ect. As I'm positive that these containers would exceed the needs of the OP, maby the OP will read this and make a consideration, But with efficiency C++ is the king. – johnathan Jul 15 '12 at 14:22
  • Why the large diffenence between 15 secs and 10 mins? – Ed Heal Jul 15 '12 at 14:25
  • how many items in the hash table? – ThomasMcLeod Jul 15 '12 at 14:35
  • if it's that much of an issue, there's always the option of writing a PHP extension using C. – Spudley Jul 15 '12 at 14:35
  • @knightrider Implement it in c++ using unordered_map , which compiler are you using? – johnathan Jul 15 '12 at 14:36
  • @EdHeal because the running time is dependent on size of input data. And the size of input data varies a lot – knightrider Jul 15 '12 at 14:37
  • @ThomasMcLeod on average around 300 – knightrider Jul 15 '12 at 14:38
  • @Spudley could you please explain or could you give a pointer to the same – knightrider Jul 15 '12 at 14:38
  • @johnathon I am using g++. Basically the work is string processing and the rest is as I mentioned above. Do you think c++ using unordered map will be better than PHP code. Could you please explain when processing in php is better than using system call to C++ and/or vice versa – knightrider Jul 15 '12 at 14:41
  • 2
    @knightrider Sure, i'll give this a shot, and also a precursor warning to ALL those that may disagree with me : i speak from personal experience. With that out of the way, Anytime your doing an operation over a large data set (such as in your case hashing a key and storing it in a container then searching through that container) that's best to be implemented in a systems language(such as C++), and the fact that there is a container that exactly fits your needs in the c++ standard library means you don't have to roll your own. PHP it's self is an interpreted language. – johnathan Jul 15 '12 at 14:46
  • 1
    @knightrider and as such , it uses other containers that are written in languages such as C. What im really getting to is the levels of indirection you have to go through to accomplish what your wanting to do. Right now your data set is on average 300 items, in the future it very may well grow to 300 thousand. Would the performance of a PHP container then be acceptable? I highly doubt it would be. The more aliasing a cpu has to do, the slower it runs. Classic example, qsort vs std::sort. – johnathan Jul 15 '12 at 14:48
  • @knightrider So,in effect, your better off to implement any algorithm in C++, and then use it in your php code. – johnathan Jul 15 '12 at 14:52
  • @johnathon So basically you are telling that its better to do every job in c++ and then use a system call, assuming that the system call cost is covered. Am i right? – knightrider Jul 15 '12 at 14:54
  • @knightrider that cost of a system call, is the determining factor. If the algorithm in question runs faster than the time it takes to make a system call in php, then it is probably better to be implemented in php, however you'll find that that is a very rare case indeed, as the cost of making a system call is very cheap. – johnathan Jul 15 '12 at 14:56
  • @knightrider - The whole point of using a has is to locate the data regradless of the size of data set. You are either not using a hash or at best a very bad hash. – Ed Heal Jul 15 '12 at 15:05
  • @knightrider - try this: http://devzone.zend.com/303/extension-writing-part-i-introduction-to-php-and-zend/ – Spudley Jul 15 '12 at 20:06

2 Answers2

1

Apparently, PHP arrays are implemented as a linked hash table. See How is the PHP array implemented on the C level?.

In any case, for 300 items there would probably be little speed difference in the type of container you used. I would stay in PHP if possible for simplicity.

Community
  • 1
  • 1
ThomasMcLeod
  • 7,603
  • 4
  • 42
  • 80
  • I will try this also. But Could you please explain when processing in php is better than using system call to C++ and/or vice versa – knightrider Jul 15 '12 at 14:50
  • @knightrider, staying in PHP is better anytime calling out does not provide a significant increase in performance. What "significant" is depends on the application. Calling out of PHP usually would increase the development and maintanence efforts substantially. A call to a C++ container may be, say, 1.5 times faster. But a PHP array is O(1) access time, which means that this performance will not decrease as you add more elements. On the other hand, if access to this container is frequent and in a critical loop, then you would need to balance the extra development effort vs increased speed. – ThomasMcLeod Jul 15 '12 at 16:02
  • @knightrider, yes it is. So my point is that there is no asymptotic performance increase in c++, even though c++ may be faster by a constant factor where the number of elements is small. – ThomasMcLeod Jul 15 '12 at 16:21
1

PHP is well known for its fast associative array implementation, but according to my experiences, C++ is still faster. A few months ago I needed to implement fast prefix matching, there were thousands of prefixes in hash table and millions of strings to be matched. I made both, PHP and C++ implementations, and as I remember C++ was more than 10 times faster and consumed much less memory. But of course, it heavily depends also on your algorithm, not only on hash table implementation.

Peter Krejci
  • 3,182
  • 6
  • 31
  • 49