11

I know that strings in php are ..... strings but for example I can do

$str = 'String';
echo $str[0];
echo $str[1];

//result
S
t

echo count($str)
//result
1

Why I can walk trough them like in an array but can't count them with count? (I know that I can use strlen() )

Yogesh Suthar
  • 30,424
  • 18
  • 72
  • 100
dofenco
  • 315
  • 1
  • 3
  • 11

2 Answers2

24

Because that's how it works. You can access specific byte offsets using bracket notation. But that doesn't mean the string is an array and that you can use functions which expect arrays on it. $string[int] is syntactic sugar for substr($string, int, 1), nothing more, nothing less.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • I'd rather call them *character offsets*, but you're right about the sugar. – GolezTrol Jun 19 '13 at 14:19
  • 3
    LOL, well timed comment. Well, they're *not* character offsets! Try: `$s = '漢字'; $s[1];` – deceze Jun 19 '13 at 14:19
  • LOL, good timing indeed. Rather than `characters`, I should have used `code points`, but `bytes` seems to be referring too much to the underlying structure and assumes PHP is not and never will be UTF16 or any other multi byte character set. I must admit I'm not sure though, and I can't find what the docs have to say about it. – GolezTrol Jun 19 '13 at 14:23
  • 1
    *Byte offset* is exactly what it is; it's not code point or character. See [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/). And whatever PHP does in the future is entirely open to speculation; there's nothing like character-based strings coming in the foreseeable future as far as I know. – deceze Jun 19 '13 at 14:26
  • [The docs says](http://php.net/manual/en/language.types.string.php): `"Characters within strings may be accessed and modified by specifying the zero-based offset of the desired character after the string using square array brackets, as in $str[42]. Think of a string as an array of characters for this purpose."` Point is, you are (illegally, basically) putting unicode information in what PHP thinks is an ANSI string. Still $str[n] refers to an ANSI code point rather than a byte (although they are the same), allowing you to exact the separate bytes of your Unicode characters. – GolezTrol Jun 19 '13 at 14:29
  • Thanks for the document about Unicode. I know it, and I've even written [a similar one](http://www.nldelphi.com/Forum/showthread.php?t=35747) myself. ;) – GolezTrol Jun 19 '13 at 14:29
  • Well, in this case, the link was more about the PHP specific parts at the end than Unicode as such. And PHP was obviously conceived with single byte encodings in mind; anything that refers to "characters" in regular "strings" is basically wrong, since it doesn't take encoding into account at all. Since PHP strings also act as container for binary data (`file_get_contents('image.jpg')`), strings really are encoding agnostic *byte arrays*. Yeah, that's not going to be confusing to @dofenco at all... ;) – deceze Jun 19 '13 at 14:39
  • 2
    The notion that accessing the bytes (not characters) of a string in PHP is merely syntatic sugar is wrong, since you can assign (change) bytes within a string using an index, but not make such changes with substr. Using indexing, PHP allows you to treat strings are arbitrary blocks of memory, needed to process a variety of non-text data sources. – Greg Young Mar 14 '16 at 02:31
8

Because strings are not arrays. They allow you to find letters byte offsets (which aren't necessarily letters in a multi-byte character string) using the same syntax for your convenience, but that's about it.

Arrays can have keys as well and can be sorted. If strings were full arrays, you could give each letter a key, or sort the letters alphabetically using one of the array functions.

Long story short: a string is not an array, even when a tiny part of their syntax is similar.

GolezTrol
  • 114,394
  • 18
  • 182
  • 210