I am trying to save the subject of incoming emails to a database. The subject is not always encoded using the same encoding, so I made this code to convert it back to utf-8 when it's not.
private function convertSubjectEncoding($subject)
{
$encoding = mb_detect_encoding($subject);
if($encoding != 'UTF-8') {
return iconv_mime_decode($subject, 0, "UTF-8");
}
return $subject;
}
For the first message, Encoding is UTF-8 and the subject is Accès SQL
. When it is saved to database, it becomes "Acce`s SQL" which is wrong and should be "Accès SQL".
For the second message, the subject is Ascii and the original subject is "=?utf-8?Q?Acc=C3=A8s_?=SQL". When converting, and also when saving it is 'Accès SQL' which is good.
Why is that a string that was originally formatted as ut8 and did not get any encoding change suddenly becomes a different string when saved? I am using Laravel 6. Here is the full relevant code:
const SUBJECT_REPLY_FORWARD_REGEX = "/([\[\(] *)?\b(RE|FWD?) *([-:;)\]][ :;\])-]*|$)|\]+ *$/im";
private function createFetchedMail($message)
{
$toList = $message->getTo();
$fetchedMail = FetchedMail::create([
'OriginalSubject' => $this->convertSubjectEncoding($message->getSubject()),
'Subject' => $this->cropSubject($this->convertSubjectEncoding($message->getSubject())),
]);
/**
* Removes subject reply and forwarding indacator (Re:, FWD:, etc.) and trims the result
*/
private function cropSubject($subject)
{
return trim(preg_replace(static::SUBJECT_REPLY_FORWARD_REGEX, '', $subject));
}
private function convertSubjectEncoding($subject)
{
$encoding = mb_detect_encoding($subject);
if($encoding != 'UTF-8') {
return iconv_mime_decode($subject, 0, "UTF-8");
}
return $subject;
}
I have tried to save directly without calling convertSubjectEncoding() and cropSubject(), I get the same erroneous string saved in database.