How to count the number of textarea's characters optimally?

Question

I need to count the number of such a textarea's value. That textarea might be containing 5000 characters. But I just need to know whether is the number of those characters more than 20 characters or not. I can do that by using strlen() function. Something like this:

$content = $_POST['textarea_content'];
$content_length = mb_strlen($content, 'utf8');
if ( $content_length > 20 ) {
    // do stuff
}

But my approach isn't optimise at all. It counts the number of all characters and then compare it. As I said, sometimes there is lots of characters like 5000 characters. So is there any approach to break counting after 20 characters?

@ameenulla0007 Look, I just need to know *"is there more than 20 characters or not?"*, just that. I don't need to know what's the number of whole characters. So counting *(for example)* 5000 characters is a waste job. — stack, Aug 26 '16 at 03:57
Your approach is fine. I just quickly tested `strlen()` with a 4m+ characters file, and it still only took a fraction of a second to get the count. Are you actually running into an issue with your counting at the moment? — Drown, Aug 26 '16 at 04:01
@Drown No no .. there isn't any problem .. I just like to improve my codes .. Actually I don't have any speed-problem and honestly I'm a bit obsession. So counting the whole characters bothers me, because knowing it is useless for me .. I just need to know is that number bigger that 20 or not. — stack, Aug 26 '16 at 04:04
This looks like a micro-optimization and any speed improvement you might gain will be offset by everyone else reading your code having to wonder why you did it this way. But `mb_strlen(mb_substr($content, 0, 21, 'utf-8'), 'utf-8') > 20` could answer your question. — DCoder, Aug 26 '16 at 04:08

score 1 · Accepted Answer · edited May 23 '17 at 11:51

Strings in PHP have an internal variable that saves the length of the string, so runtime of strlen($str) is not depends on the length of the string at all.

Your problem is that you want to use mb_strlen in order to get the number of characters in the string (and not the number of bytes). In other words - you want to know the length of the string, even if the string contains Unicode characters.

If you know that your string is UTF-8, it can be used for optimization. UTF-8 will save at most 4-bytes per char, so if you use isset($str[80]) - you know for sure that your string is at-least 20 chars (and probably much more). If not, you will still have to use the mb_ functions to get the information you need.

The reason for the usage of isset instead of strlen is because you asked about the optimized way. You can read more in this question regarding the two.

To sum it up - your optimized code would probably be:

if (isset($str[80]) || mb_strlen(mb_substr($str, 0, 21, 'utf-8'), 'utf-8') > 20) {
    ....
}

In php, the code will first check the isset part, and if it return true the other part will not run (so you get the optimization here from both the isset and the fact that you don't need to run the mb_ functions).

If you have more information about the characters in your string you can use it for more optimization (if, for example, you know that your all of the chars in your string are from the lower range of the UTF-8, you don't have to use $str[80], you might as-well use $str[40].

You can use this table from wikipedia:

Together with the information from the utf8-chartable website:

In order to help optimize the number of bytes you might need for each char in your string.

How to count the number of textarea's characters optimally?

1 Answers1