21
if (strpos(htmlentities($storage->getMessage($i)),'chocolate')) 

Hi, I'm using gmail oauth access to find specific text strings in email addresses. Is there a way to find text instances quicker and more efficiently than using strpos in the above code? Should I be using a hash technique?

Bob Cavezza
  • 2,810
  • 7
  • 38
  • 56
  • 6
    You **must** use the strict comparison operator with the `strpos()` function. This is because it may return an integer `0`, which means the string 'chocolate' was found at the start of the string. With the statement as you have it, this would evaluate to `FALSE`. Correct would be `if (strpos(htmlentities($storage->getMessage($i)), 'chocolate') !== FALSE)` – chigley Oct 06 '10 at 15:22
  • 2
    Why are you using `htmlentities()`? It slows everything down. – NullUserException Oct 06 '10 at 15:23
  • Because I'm also searching if it's in the html - would it take link urls into consideration if I dropped the htmlentities tag? – Bob Cavezza Oct 06 '10 at 15:31
  • for example: Bob's Site - would this still return true if I omitted htmlentities? – Bob Cavezza Oct 06 '10 at 15:31
  • htmlentities doesn't change anything but specific characters like the quotation mark. It would be best to make sure they're also not encoded in the string you're searching, and not use that function in this case. For your example string "chocolate" the entities make no difference. – jjrv Oct 06 '10 at 15:40
  • @Bob It wouldn't make a difference. In your example, `strpos` finds both `chocoloate` (sic) and `Site` with or without `htmlentities()` – NullUserException Oct 06 '10 at 16:18

3 Answers3

34

According to the PHP manual, yes- strpos() is the quickest way to determine if one string contains another.

Note:

If you only want to determine if a particular needle occurs within haystack, use the faster and less memory intensive function strpos() instead.

This is quoted time and again in any php.net article about other string comparators (I pulled this one from strstr())

Although there are two changes that should be made to your statement.

if (strpos($storage->getMessage($i),'chocolate') !== FALSE)

This is because if(0) evaluates to false (and therefore doesn't run), however strpos() can return 0 if the needle is at the very beginning (position 0) of the haystack. Also, removing htmlentities() will make your code run a lot faster. All that htmlentities() does is replace certain characters with their appropriate HTML equivalent. For instance, it replaces every & with &

As you can imagine, checking every character in a string individually and replacing many of them takes extra memory and processor power. Not only that, but it's unnecessary if you plan on just doing a text comparison. For instance, compare the following statements:

strpos('Billy & Sally', '&'); // 6
strpos('Billy & Sally', '&'); // 6
strpos('Billy & Sally', 'S'); // 8
strpos('Billy & Sally', 'S') // 12

Or, in the worst case, you may even cause something true to evaluate to false.

strpos('<img src...', '<'); // 0
strpos('&lt;img src...','<'); // FALSE

In order to circumvent this you'd end up using even more HTML entities.

strpos('&lt;img src...', '&lt;'); // 0

But this, as you can imagine, is not only annoying to code but gets redundant. You're better off excluding HTML entities entirely. Usually HTML entities is only used when you're outputting text. Not comparing.

stevendesu
  • 15,753
  • 22
  • 105
  • 182
2

strpos is likely to be faster than preg_match and the alternatives in this case, the best idea would be to do some benchmarks of your own with real example data and see what is best for your needs, although that may be overdoing it. Don't worry too much about performance until it starts to become a problem

neopickaze
  • 981
  • 8
  • 14
  • it's already somewhat of a problem. I'm trying to quickly search for this string is user's email inboxes, and it takes about 2 seconds to sort through a single email. I would like to get this number down to at least a half a second. – Bob Cavezza Oct 06 '10 at 15:47
  • Are you sure the bottleneck is in the strpos, or with the inbox search? If you are using imap let me know I may be able to help further. – neopickaze Oct 07 '10 at 10:28
0

strpos() return the begin position of first occurrence of string, if no match will return Null so statement is fairly usable.

if (!is_null(strpos($storage->getMessage($i),'chocolate'))) {}
Citricguy
  • 412
  • 7
  • 21
kingunits
  • 99
  • 6