16

I've stumbled upon an issue with strstr in an old legacy codebase. There's lot of code, but basically the test case would come down to this:

$value = 2660;
$link = 'affiliateid=1449&zoneid=6011&placement_id=11736&publisher_id=1449&period_preset=yesterday&period_start=2017-03-27&period_end=2017-03-27';

var_dump(strstr($link, $value));

I would expect this to return false since "2660" is not in the string however it returns d=1449&zoneid=6011&placement_id=11736&publisher_id=1449&period_preset=yesterday&period_start=2017-03-27&period_end=2017-03-27.

I realise that $value should be a string but still I don't understand why it's not casted to a string by PHP and why it's finding this number in the link.

Actually, if I try with $value = '2660'; it returns false as expected.

Any idea what's happening?

laurent
  • 88,262
  • 77
  • 290
  • 428
  • 1
    Just stringify it `var_dump(strstr($link, (string)$value));` PHP is not a strict language and tends to give strange results, when different types are being compared. Just like `(1=="a1")` equals as true. – Peon Mar 28 '17 at 10:15
  • 1
    Excuse my ignorance, but... what "strstr" means? I checked the documentation and I honestly don't have a clue why such a weird name was picked up for this function! Is it "Stray String"? "Search This Range for a String"? "String String"? – T. Sar Mar 28 '17 at 17:37
  • @Tsar, yeah it's neither a good name nor a good function. Maybe it comes from strpos which returns the **pos**ition of the first occurrence, while strstr returns the **str**ing at the first occurrence. – laurent Mar 28 '17 at 18:00
  • 2
    @TSar: `strstr` finds a **str**ing inside another **str**ing. Don't blame PHP for this; the name comes from C. – jwodder Mar 28 '17 at 20:20
  • @jwodder Eh, that's not an excuse :P But fair point. C is know to be really... uh, let's say, _exotic_ regarding some naming decisions here and there, so I'm not surprised at all! – T. Sar Mar 28 '17 at 20:38

2 Answers2

38

Short answer

When you run strstr($str, 2660) the $needle is resolved to the character "d" by calling chr(2660) and therefore it stops at the first "d" found in the given $str, in this case right at the 11th character.


Why are we calling chr(2660)?

Because when the $needle is not a string strstr casts that argument to an integer and uses the corresponding character for that position from the extended ASCII code where chr(2660) is "d".

If needle is not a string, it is converted to an integer and applied as the ordinal value of a character.


But why does chr(2660) return "d" when "d" is ord(100)?

Because values outside the valid range [0-255] will be bitwise and'ed with 255, which is equivalent to the following algorithm[source]

while ($ascii < 0) {
    $ascii += 256;
}
$ascii %= 256;

So, 2660 becomes 100, and then when passed to strstr it's used as the ordinal value of the character and looks for character "d".

Confusing? Yes. I also expected it to be casted to a string, but that's what we get for assuming things in programming. This, at least, is documented; you'd be surprised the amount of times something weird happens and there's no official explanation to be found. At least not as easily as following the same link you provided.


Why is it named strstr?

I did a little bit of research and found this gem (Rationale for American National Standard for Information Systems - Programming Language - C) from all the way back 1989 where they named all the functions relating to strings with the prefix str which is logical, then since PHP's source is written in C it will explain why it has carried. My best guess is that we are searching for a string inside another string, they do say:

The strstr function is an invention of the Committee. It is included as a hook for efficient substring algorithms, or for built-in substring instructions.

Useful docs

  • Documentation for strstr
  • Documentation for chr
  • PHPwtf a good resource for weirdness

Juan Cortés
  • 20,634
  • 8
  • 68
  • 91
  • 4
    I looked into [chr](http://php.net/manual/en/function.chr.php), apparently it checks division with 256 `$ascii %= 256;`, thus 2660 turns into 100 which is `d`. That is actually pretty strange creation. – Peon Mar 28 '17 at 10:24
  • Yeah, it is kind-of documented in the PHP doc, [*"Example #2 Overflow behavior"*](http://php.net/manual/en/function.chr.php). Also, the first comment there truly sheds light on it: *"Note that if the number is higher than 256, it will return the number mod 256."*. Weird that this isn't noted more clearly in the *Return Values* or *Parameters* section though. – domsson Mar 28 '17 at 10:26
  • 3
    Well, PHP usually comes with a large set of confusing _weird_ things – Délisson Junio Mar 28 '17 at 18:20
8

I think this answers your question:

needle
If needle is not a string, it is converted to an integer and applied as the ordinal value of a character.

http://php.net/manual/en/function.strstr.php

Edit because of the comments:

chr(2660) returns character d, which is indeed in the haystack and that's why it won't return false as you expected.

walther
  • 13,466
  • 5
  • 41
  • 67