5

So I have a form that when submitted by iPhone, if the user enters ’ it is entered in the database at ’.

I'm wondering if there is a way to convert this to a single character before entering it into the database. The main reason I need this is because it is being sent out as a txt message and every character counts.

I'd like to know if there is a function to convert these characters

— enters as — convert to -
– enters as — convert to -
“ enters as “ convert to "
” enters as †convert to "
‘ enters as ‘ convert to '
’ enters as ’ convert to '

The problem really is not that it's stored that way in the database, but rather when the txt message is sent pulling data from the database.

In further testing, eliminating the database, did a test with a form submitting to php and emailing to sms gateway, when using phone to enter characters such as “ ” do not go through in the txt message, so this make me think that they are becoming mojibake. I have set in the page with the form.

Here is another illustration that demonstrates the problem. Here an iPhone (6s iO2 11.2.2 safari) submitting text to a php script which emails to an sms gateway, the text comes through without the special characters (“ ” ‘ ’), instead those characters are shown with a b, example text sent as “test” ‘test’ will come through in the txt as btestb btestb. Below is the ultra simple code that reproduces this issue.

filename: sms.php (using php 7.1.13)

<?
if(isset($_POST['sub'])){
    $data = isset($_POST['data'])?$_POST['data']:NULL;
    if($data){
        if(mail('5555555555@messaging.sprintpcs.com','',$data,'From: name@somedomain.com')){
        echo 'sent!';
        };
    }
}
?>
<!DOCTYPE html>
<html lang="en">
<head>
    <title>test</title>
    <meta charset="UTF-8" />
    <meta content="minimum-scale=1.0, width=device-width, maximum-scale=1, user-scalable=no" name="viewport">
</head>
<body>
    <form action="sms.php" method="post" />
        <input label="enter txt here" value="" name="data" />
        <input type="submit" value="go" name="sub" />
    </form>
</body>
</html>
  • use an iPhone to enter the following characters “ ” ‘ ’
drooh
  • 578
  • 4
  • 18
  • 46
  • 2
    Could this be more of an encoding issue maybe? – C Miller Dec 23 '17 at 00:09
  • why don't you just add it as a blob value? This way you can count the number of bytes in the blob field rather than counting chars – Psi Dec 23 '17 at 00:10
  • 1
    Yeah, it looks like the iPhone is sending a UTF-8 submission, and you're somehow not entering it into the database as UTF-8. What is your DB charset? Table charset? The solution is to allow UTF-8 characters in the database, not try to do a conversion. – Brian Gottier Dec 23 '17 at 00:20
  • That is Mojibake for `’`. For a fix, see http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases – Rick James Dec 28 '17 at 00:17
  • Please see [character set conversion](https://stackoverflow.com/questions/4387884/character-set-conversion) and [weird charactors on HTML page](https://stackoverflow.com/questions/3111215/weird-charactors-on-html-page) – ctwheels Jan 17 '18 at 20:25
  • 1
    @drooh If you end up “converting” mangled characters, you are doing something terribly wrong and your headaches might get worse. For your own happiness and the sake of a learning effect, please start at what Jeremy Jones described in his/her answer below. Try to find the underlying encoding problem – it is most likely situated an some transmission point (e.g. from HTML to PHP or from PHP to MySQL). – feeela Jan 22 '18 at 13:49
  • To further debug this, use `echo bin2hex(...)` on the string that is coming in to PHP. Then show us the output, plus tell us what the string should be. – Rick James Jan 22 '18 at 21:12

2 Answers2

3

Essentially everything needs to be UTF-8 to deal with this. Tracking down the place where the corruption is happening is tedious but it's the only real answer. It could be early on, e.g. when the information is coming into the PHP script or going into the database, or later when it's being retrieved.

Final possibility to be kept in mind is that it might not really be corrupted at all -- sometimes it's just that the terminal or other output isn't set correctly (i.e. the very end of the chain), and it's just in checking it that it looks wrong due to your viewer, rather than the data itself or how it's being stored.

Jeremy Jones
  • 4,561
  • 3
  • 16
  • 26
2

I reopened because this question implies that the Mojibake came from MySQL; the other question treated it as a PHP problem. PHP and HTML are unlikely to cause the problem; the source of the problem is mismatch of latin1 and utf8 when inserting/retrieving data via MySQL.

See "Mojibake" in Trouble with UTF-8 characters; what I see is not what I stored and ways to fix the data: http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases .

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I'm also having an issue with this character – it happens when copy paste info from another website into a form in my cms which has utf-8 set – drooh Jan 16 '18 at 03:42
  • 1
    @drooh - `–` is [_Mojibake_](https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored) for an EN DASH (`–`). Somewhere `latin1` is involved in your CMS. – Rick James Jan 16 '18 at 12:42
  • Could this happen when copy / paste from another source? I have utf-8 in the header of my docs. This happens on my home server and commercial host. Where would I look for latin1? – drooh Jan 16 '18 at 16:33
  • More details, please -- you mentioned CMS, phone, two hosts, copy, etc. What code in what language happens at each step? – Rick James Jan 16 '18 at 18:55
  • Basically when using an iPhone to enter text into an input it is stored (php/jquery ajax) in the db (mysql) with the mojibake characters – drooh Jan 16 '18 at 21:45
  • @drooh - PDO? mysqli? `
    `? Let's see `SHOW CREATE TABLE`.
    – Rick James Jan 16 '18 at 22:14
  • its mysqli and I tried the form accept-charset="UTF-8", not sure what SHOW CREATE TABLE is – drooh Jan 16 '18 at 22:41
  • @drooh - Use the mysql commandline tool to connect to the database; use `SHOW CREATE TABLE` to get the table definition. – Rick James Jan 16 '18 at 23:43
  • ENGINE=MyISAM AUTO_INCREMENT=48 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci – drooh Jan 17 '18 at 04:59
  • @drooh - I was hoping for the entire `SHOW`, not just the last line. – Rick James Jan 17 '18 at 12:59
  • @drooh: I'd put my money on the problem here being the charset of the database connection. With MySQLi, you should be calling [`$mysqli->set_charset('utf8')`](http://php.net/manual/en/mysqli.set-charset.php) after connecting. That said, if you're trying to reduce the byte length (as opposed to character length) of the resulting txt message, you may also wish to try something like [`iconv('UTF-8', 'ASCII//TRANSLIT', $str)`](https://secure.php.net/manual/en/function.iconv.php). – eggyal Jan 18 '18 at 10:21
  • @eggyal - What will happen for the approx 1,110,000 UTF-8 characters that have no 1-byte equivalent? I have a Rule of Thumb -- If an idea won't save more that 10% of something, move on to other things. (Very few strings will be shortened by that translit by more than 3%.) – Rick James Jan 18 '18 at 16:05
  • @RickJames: Unfortunately I can’t find a complete mapping, but for the specific characters discussed in the question (which the OP said he wants “*to convert to a single character ... because it is being sent out as a txt message and ever character counts*”—I assume he actually meant bytes, not characters, since SMS is limited by the string’s octet length not character length) they will be converted exactly as requested. – eggyal Jan 18 '18 at 17:34
  • Emojis are 3-4 bytes. But still that is a lot shorter than "pile of poo" for – Rick James Jan 19 '18 at 03:35
  • I just need to convert the most commonly used characters typed on iPhone so that they are 1 byte, right doing something like this solves the majority of the issue $a = array('—','–','“','”','‘','’');$b = array('-','-','"','"','\'','\'');$s = str_replace($a,$b,$s); – drooh Jan 19 '18 at 07:10
  • This answer is just guessing. The encoding problem may also originate from an HTML form being sent in another encoding than what the script (PHP) on the server is using. This is not unlikely but a common mistake. – feeela Jan 22 '18 at 13:45
  • I'm wondering if this might be the issue, is there a way to check the php encoding? – drooh Jan 22 '18 at 20:17
  • @drooh - Use bin2hex to peer into what is really in PHP strings. Use `SELECT HEX(...) ... ` to see what is really in the database table. Mojibake is correctly stored bytes that are _incorrectly interpreted_. – Rick James Jan 22 '18 at 21:18