2

I got a bit of a stupid question;

Currently I am making a website for a company on a server which actually has a bit an outdated PHP version (5.2.17). I have a database in which many fields are varchar with characters like 'é ä è ê' and so on, which I have to display in an HTML page.

So as the version of PHP is outdated (and I am not allowed to updated it because there are parts of the site that must keep working and to whom I have no acces to edit them) I can't use the htmlentities function with the ENT_SUBSTITUTE argument, because it was only added after version 5.4.

So my question is:

Does there exist an alternative to htmlentities($string,ENT_SUBSTITUTE); or do I have to write a function myself with all kinds of strange characters, which would be incomplete anyway.

JohannesB
  • 1,995
  • 21
  • 35
  • There's nothing about having accented characters in a database that would necessitate using `ENT_SUBSTITUTE` per se. What problem are you [actually trying to solve](http://meta.stackexchange.com/a/66378/476)?! – deceze Aug 07 '13 at 13:38
  • 1
    possible duplicate of [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through), [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/), [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) – deceze Aug 07 '13 at 13:39

3 Answers3

2

Define a function for handling ill-formed byte sequences and call the function before passing the string to htmlentties. There are various way to define the function.

At first, try UConverter::transcode if you don't use Windows.

http://pecl.php.net/package/intl

If you are willing to handle bytes directly, see my previous answer.

https://stackoverflow.com/a/13695364/531320

The last option is to develop PHP extension. Thanks to php_next_utf8_char, it's not hard. Here is code sample. The name "scrub" comes from Ruby 2.1 (see Equivalent of Iconv.conv("UTF-8//IGNORE",...) in Ruby 1.9.X?)

// header file
// PHP_FUNCTION(utf8_scrub);

#include "ext/standard/html.h"
#include "ext/standard/php_smart_str.h"

const zend_function_entry utf8_string_functions[] = {
    PHP_FE(utf8_scrub, NULL)
    PHP_FE_END
};

PHP_FUNCTION(utf8_scrub)
{
    char *str = NULL;
    int len, status;
    size_t pos = 0, old_pos;
    unsigned int code_point;
    smart_str buf = {0};

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &str, &len) == FAILURE) {
        return;
    }

    while (pos < len) {

        old_pos = pos;
        code_point = php_next_utf8_char((const unsigned char *) str, len, &pos, &status);

        if (status == FAILURE) {

            smart_str_appendl(&buf, "\xEF\xBF\xBD", 3);

        } else {

            smart_str_appendl(&buf, str + old_pos, pos - old_pos);

        }

    }

    smart_str_0(&buf);
    RETURN_STRINGL(buf.c, buf.len, 0);
    smart_str_free(&buf);
}
Community
  • 1
  • 1
masakielastic
  • 4,540
  • 1
  • 39
  • 42
0

You don't need ENT_SUBSTITUTE if your encoding is handled correctly.

If the characters in your database are utf-8, stored in utf-8, read in utf-8 and displayed to the user in utf-8 there should be no problem.

Halcyon
  • 57,230
  • 10
  • 89
  • 128
  • On my html page I get characters like ���� all the time, so I guess something is wrong then :/ – JohannesB Aug 07 '13 at 13:09
  • Yep, that looks bad. You can probably find some guides on how to set it up properly. – Halcyon Aug 07 '13 at 13:10
  • I user encode_utf8($string) and most of it displays correct now, but some characters just disappear, it's really weird. – JohannesB Aug 07 '13 at 14:30
  • What about those people who use Windows-1251 because their legacy CMS in a legacy site uses it? And when they parse the data from an utf-8 site and need to convert every symbol absent in the Win-1251 to an html-entity? And have to deal with php5.3 on their hosting? The world itself is utterly incorrect, I am sorry. – Gherman Jun 10 '14 at 12:20
  • I would change the site to use Windows-1251 encoding, that seems the most sensible. – Halcyon Jun 11 '14 at 11:35
0

Just add

if (!defined('ENT_SUBSTITUTE')) define('ENT_SUBSTITUTE', 0);

and you'll be able to use ENT_SUBSTITUTE into htmlentities.

webshaker
  • 159
  • 6
  • What's the point of this? This will simply hide any E_NOTICE (resulting from the constant not being defined), it won't actually make it "work"?! – MrWhite Jan 25 '21 at 19:18