2

I'm curious what the most performant method of doing string transformations is. Given a n input string and a set of translations, what method is the most efficient in general? I currently use strtr(), but have tested various looping methods, str_replace() with an array, etc. The strtr() method benchmarks the fastest on my system, depending on the translations, but I'm curious if there are faster methods I haven't thought of yet.

If it's pertinent, my particular use case involves transforming 2-byte strings into ANSI color sequences for a terminal. Example:

// In practice, the number of translations is much greater than one...
$out = strtr("|rThis should be red", array('|r' => "\033[31m"));
Marco Demaio
  • 33,578
  • 33
  • 128
  • 159
FtDRbwLXw6
  • 27,774
  • 13
  • 70
  • 107
  • Does it really, really matter? Are you having actual performance issues? – Pekka Oct 01 '11 at 13:33
  • 4
    @Pekka: Not every question related to performance is a case of premature optimization. – FtDRbwLXw6 Oct 01 '11 at 13:41
  • Well, most around here on SO are. Hence my question. :) – Pekka Oct 01 '11 at 13:42
  • @Pekka: I'm mostly just curious, as noted above. The `strtr()` method is "fast enough" for now, but my application centers around sending large amounts of string data to many users, and all of it has to pass through this bottleneck. If I can improve the performance at all, it wouldn't be wasted effort :) – FtDRbwLXw6 Oct 01 '11 at 13:49
  • 3
    @Pekka: let's be sincere, isn't banchmarking these kind stuff also fun?! – Marco Demaio Feb 27 '13 at 10:15

3 Answers3

5

For simple replacements, strtr seems to be faster, but when you have complex replacements with many search strings, it appears that str_replace has the edge.

Steve Tauber
  • 9,551
  • 5
  • 42
  • 46
T0xicCode
  • 4,583
  • 2
  • 37
  • 50
2

strtr() performs best with straight character replacements. Longer strings give str_replace() the edge.

For example, the code below yields the following results on my (shared web hosting) system:

Execution timings on PHP 7.0.6:
test_strtr(): 0.37670969963074; result: Lorem ipsum dolor sit amet\, \tconsectetur adipiscing elit\, \nsed do eiusmod \%tempor \'incididunt\' ut labore et DELIMITER dolore\; trunc8 \\magna \"aliqua\".
test_str_ireplace(): 0.73557734489441; result: Lorem ipsum dolor sit amet, \tconsectetur adipiscing elit, \\nsed do eiusmod \%tempor \'incididunt\' ut labore et de-limiter dolore\; trunc8 \\magna \"aliqua\".
test_str_replace(): 0.28119778633118; result: Lorem ipsum dolor sit amet, \tconsectetur adipiscing elit, \\nsed do eiusmod \%tempor \'incididunt\' ut labore et DELIMITER dolore\; trunc8 \\magna \"aliqua\".

When we take out 'delimiter' and 'truncate', results become:

Execution timings on PHP 7.0.6:
test_strtr(): 0.14877104759216; result: Lorem ipsum dolor sit amet\, \tconsectetur adipiscing elit\, \nsed do eiusmod \%tempor \'incididunt\' ut labore et DELIMITER dolore\; truncate \\magna \"aliqua\".
test_str_ireplace(): 0.58186745643616; result: Lorem ipsum dolor sit amet, \tconsectetur adipiscing elit, \\nsed do eiusmod \%tempor \'incididunt\' ut labore et DELIMITER dolore\; truncate \\magna \"aliqua\".
test_str_replace(): 0.20531725883484; result: Lorem ipsum dolor sit amet, \tconsectetur adipiscing elit, \\nsed do eiusmod \%tempor \'incididunt\' ut labore et DELIMITER dolore\; truncate \\magna \"aliqua\".

So, as of PHP 7.0.6, strtr() suffers a considerable penalty with longer replacements. The code:

const LOOP = 333;
const SQL_ESCAPE_MAP = array( // see https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet#MySQL_Escaping
    "\x00"  =>  '\x00', // NUL
    "\n"        =>  '\n',       // LF
    "\r"        =>  '\r',       // CR
    "\\"        =>  '\\\\', // backslash
    "'"         =>  "\'",       // single quote
    '"'         =>  '\"',       // double quote
    "\x1a"  =>  '\x1a', // SUB or \Z (substitute for an invalid character)
    "\t"        =>  '\t',   // TAB
    "\x08"  =>  '\b',   // BS
    '%'         =>  '\%',   // Percent
    '_'         =>  '\_',   // Underscore
    ';'         =>  '\;',   // Semicolon
    ','         =>  '\,',   // Comma
    'delimiter' =>  'de-limiter', // SQL delimiter keyword
    'truncate'  =>  'trunc8', // SQL truncate keyword
);

const SQL_SEARCH = array("\x00", "\n", "\r", "\\", "'", '"', "\x1a", "\t", "\x08", "%", ";", 'delimiter', 'truncate');
const SQL_REPLACE = array('\x00','\n','\r','\\\\',"\'",'\"', '\x1a', '\t', '\b', '\%', '\;', 'de-limiter', 'trunc8');

const TEST_STRING = "Lorem ipsum dolor sit amet, \tconsectetur adipiscing elit, \nsed do eiusmod %tempor 'incididunt' ut labore et DELIMITER dolore; truncate \magna \"aliqua\".";

function test_strtr() {
  for($i= 0; $i < LOOP; $i++) {
    $new_string = strtr(TEST_STRING, SQL_ESCAPE_MAP);
  }
    return $new_string;
}
function test_str_ireplace() {
  for($i= 0; $i < LOOP; $i++) {
    $new_string = str_ireplace(SQL_SEARCH, SQL_REPLACE, TEST_STRING);
  }
    return $new_string;
}
function test_str_replace() {
  for($i= 0; $i < LOOP; $i++) {
    $new_string = str_replace(SQL_SEARCH, SQL_REPLACE, TEST_STRING);
  }
    return $new_string;
}

$timings = array(
    'test_strtr' => 0,
    'test_str_ireplace' => 0,
    'test_str_replace' => 0,
);

for($i= 0; $i < LOOP; $i++) {
    foreach(array_keys($timings) as $func) {
        $start = microtime(true);
        $$func = $func();
        $timings[$func] += microtime(true) - $start;
    }
}

echo '<pre>Execution timings on PHP ' . phpversion('tidy') . ":\n";
foreach(array_keys($timings) as $func) {
    echo $func . '(): ' . $timings[$func] . '; result: ' . $$func . "\n";
}
echo "</pre>\n";

Note:

  1. This sample code is not meant as a production alternative to mysqli::real_escape_string in lieu of a DB connection (there are issues around binary/multi-byte encoded input).

  2. Clearly, the differences are minor. For mnemonic reasons (how matches and replacements are organized) I prefer the associative array that strtr takes natively. (Not that it can't be achieved with array_keys() for str_replace.) The differences in this case are definitely within the realm of micro-optimizations, and can be very different with different inputs. If you need to process huge strings thousands of times per second, benchmark with your specific data.

SashaK
  • 165
  • 6
1

I made a trivial benchmark for personal needs on the two functions. The goal is to change a lowercase 'e' to an uppercase 'E'.

<?php

$stime = time();

for ($i = 0; $i < 1000000; $i++) {
    str_replace('e', 'E', "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices diam. Maecenas ligula massa, varius a, semper congue, euismod non, mi. Proin porttitor, orci nec nonummy molestie, enim est eleifend mi, non fermentum diam nisl sit amet erat. Duis semper. Duis arcu massa, scelerisque vitae, consequat in, pretium a, enim. Pellentesque congue. Ut in risus volutpat libero pharetra tempor. Cras vestibulum bibendum augue. Praesent egestas leo in pede. Praesent blandit odio eu enim. Pellentesque sed dui ut augue blandit sodales. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam nibh. Mauris ac mauris sed pede pellentesque fermentum. Maecenas adipiscing ante non diam sodales hendrerit.");
}

echo time() - $stime . "\n";

?>

This code using str_replace runs in 6 seconds. Now the same with the strtr function :

<?php

$stime = time();

for ($i = 0; $i < 1000000; $i++) {
    strtr("Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices diam. Maecenas ligula massa, varius a, semper congue, euismod non, mi. Proin porttitor, orci nec nonummy molestie, enim est eleifend mi, non fermentum diam nisl sit amet erat. Duis semper. Duis arcu massa, scelerisque vitae, consequat in, pretium a, enim. Pellentesque congue. Ut in risus volutpat libero pharetra tempor. Cras vestibulum bibendum augue. Praesent egestas leo in pede. Praesent blandit odio eu enim. Pellentesque sed dui ut augue blandit sodales. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam nibh. Mauris ac mauris sed pede pellentesque fermentum. Maecenas adipiscing ante non diam sodales hendrerit.", 'e', 'E');
}

echo time() - $stime . "\n";

?>

It took only 4 seconds.

So as stated by T0xicCode, for this particularly simple case, strtr is indeed faster than str_replace, but the difference is not so significant.

sylozof
  • 86
  • 5