1

I can't understand the following and I'm hoping someone can shed some light on it for me:

In C++ if I create a vector of test data containing 2M different bits of text (testdata) then create a map using these strings as the index values, then look up all the values, like this:

 //Create test data
for(int f=0; f<loopvalue; f++)
{   
    stringstream convertToString;
    convertToString << f;
    string strf = convertToString.str();
    testdata[f] = "test" + strf;
}

    time_t startTimeSeconds = time(NULL);

   for(int f=0; f<2000000; f++) testmap[ testdata[f] ] = f; //Write to map
   for(int f=0; f<2000000; f++) result = testmap[ testdata[f] ]; //Lookup

   time_t endTimeSeconds = time(NULL);
   cout << "Time taken " << endTimeSeconds - startTimeSeconds << "seconds." << endl;

It takes 10 seconds.

If I do seemingly at least the same in PHP:

<?php
$starttime = time();
$loopvalue = 2000000;

//fill array
for($f=0; $f<$loopvalue; $f++)
{
    $filler = "test" . $f;
    $testarray[$filler] = $f;
}

//look up array
for($f=0; $f<$loopvalue; $f++)
{
    $filler = "test" . $f;
    $result = $testarray[$filler];
}

$endtime = time();
echo "Time taken ".($endtime-$starttime)." seconds.";
?>

...it takes only 3 seconds.

Given that PHP is written in C does anyone know how PHP achieves this much faster text index lookup?

Thanks C

Columbo
  • 2,896
  • 7
  • 44
  • 54
  • 2
    Switch the string-keyed map to `std::unordered_map` and be amazed about the difference. You can do the same for the other map, coming to think of it. If you don't have that class, consider `std::tr1::unordered_map` from ``. – Kerrek SB Nov 09 '11 at 11:20
  • 3
    What compiler settings did you use for the C++ code? Especially which compiler and what level of optimisations? – tokage Nov 09 '11 at 11:24
  • Is result = testmap[ testdata[f] ] **= f** a typo? – Baffe Boyois Nov 09 '11 at 11:35
  • Sorry about typo...corrected. – Columbo Nov 09 '11 at 11:38
  • I've moved the timing around only the lookup and used unordered_map and sure enough it is much faster, the same speed between PHP and C++. I was expecting C++ to be alot faster. I'm using g++, I'm just looking into the compiler optimisations now. – Columbo Nov 09 '11 at 12:04

3 Answers3

5

Your loops are not absolutely equivalent algorithms. Note that in the C++ version you have

  1. testmap[ testdata[f] ] - this is actually a lookup + insert
  2. testmap[ testdata[f] ] - 2 lookups

In the PHP versions you just have insert in the first loop and lookup in the second one.

PHP is interpreted - generally if you code is faster in PHP, check the code first ! ;-)

Ventsyslav Raikov
  • 6,882
  • 1
  • 25
  • 28
2

I suspect you benchmark the wrong things. Anyway, I used your code (had to make some assumptions on your data types) and here are the results from my machine:

PHP: Time taken 2 seconds.

C++ (using std::map): Time taken 3 seconds.

C++ (using std::tr1::unordered_map): Time taken 1 seconds.

C++ compiled with

g++ -03

Here is my test C++ code:

#include <map>
#include <sstream>
#include <string>
#include <vector>
#include <iostream>
#include <tr1/unordered_map>


int main(){
    const int loopvalue=2000000;
    std::vector<std::string> testdata(loopvalue);
    std::tr1::unordered_map<std::string, int> testmap;
    std::string result;
    for(int f=0; f<loopvalue; f++)
    {   
        std::stringstream convertToString;
        convertToString << f;
        std::string strf = convertToString.str();
        testdata[f] = "test" + strf;
    }

    time_t startTimeSeconds = time(NULL);

    for(int f=0; f<loopvalue; f++) testmap[ testdata[f] ] = f; //Write to map
    for(int f=0; f<loopvalue; f++) result = testmap[ testdata[f] ]; //Lookup

    time_t endTimeSeconds = time(NULL);
    std::cout << "Time taken " << endTimeSeconds - startTimeSeconds << "seconds." << std::endl;
}

Conclusion:

You tested unoptimized C++ code, probably even compiled with VC++, which by default has a bounds check in std::vector::operator[] when compiled in debug mode.

There still is a difference of PHP to the optimised C++ code, when we use std::map, because of the difference in lookup complexity (see n0rd's answer), but C++ is faster when you use a Hashmap.

Community
  • 1
  • 1
tokage
  • 196
  • 3
  • Of course for benchmarks, one should use a timing function with a lot finer granularity, among other things. – tokage Nov 09 '11 at 12:17
0

According to another question, associative arrays in PHP are implemented as hash tables, which have search complexity of O(1) on average, while std::map in C++ is a binary tree with search complexity of O(log n), which is slower.

Community
  • 1
  • 1
n0rd
  • 11,850
  • 5
  • 35
  • 56
  • Arguably, it's not the theoretical complexity difference between hash table and binary search tree that matters here, but the fact that string comparison is very slow. The hashing makes *that* aspect a lot more efficient. If you tried the same with maps keyed on integers, I doubt the difference would be very large. – Kerrek SB Nov 09 '11 at 12:09
  • @Kerrek SB You are right, but you can shave of some time of the string comparison by putting the number first. – tokage Nov 09 '11 at 12:21