-1

I have a CSV file to import into the database, I found in many places string has encoded in "Mete y S\303\241cala" this manner.

I want to encode it into the original string i.e. "Mete y Sácala".

Reference: https://mothereff.in/utf-8

I have used below function

iconv(mb_detect_encoding("Mete y S\303\241cala", mb_detect_order(), true), "UTF-8","Mete y S\303\241cala");

this works perfect!

I'm reading a file by ExcelReader and looping the content.

but when I used the actual variable it does not convert it.

// loop

iconv(mb_detect_encoding($rec['title'], mb_detect_order(), true), "UTF-8",$rec['title']);

Not work with the loop variable, It might be the issue with forward slashes.

ndm
  • 59,784
  • 9
  • 71
  • 110
Tech Aimley
  • 67
  • 2
  • 7
  • Is there a reason you're not using [UTF-8 all the way through](https://stackoverflow.com/questions/279170/utf-8-all-the-way-through/279279)? – CD001 Jan 21 '19 at 15:36
  • My DB Collation is utf8_general_ci – Tech Aimley Jan 21 '19 at 15:50
  • Looks like octal byte escapes, so `stripcslashes()`. // Please fix your question title; this is misleading. – mario Jan 21 '19 at 15:50
  • Yes, the original encoding was UTF-8. But clearly high bytes have been converted to C-style escapes. I'd suggest trying things out before relying an flawed assumptions. – mario Jan 21 '19 at 16:25

2 Answers2

0

PHP standard library has a solution:

$decodedString = utf8_decode($string);
  • I'm reading a file using ExcelReader and looping the column content and saving into the database. if I use it like this: utf8_decode($record['title']);. it doesn't work – Tech Aimley Jan 21 '19 at 15:49
0

stripcslashes() does not just strip backslashes, but handles \r, \n and \123 character escapes. Because that's clearly what the CSV encoder produced.

I have used below function

iconv(mb_detect_encoding("Mete y S\303\241cala", mb_detect_order(), true), "UTF-8","Mete y S\303\241cala");

this works perfect!

That's not what's happening there. PHP interprets \303\241 back to the original string bytes, when it encounters them in a double quotes "Mete y S\303\241cala" string expression. Neither iconv nor mb_* are doing anything here.

In contrast to using a single quoted string 'Mete y S\303\241cala' or some literal data read from a file. In such cases you'll have to decode the octals yourself (aforementioned function).

Anyway, this isn't "UTF-8 encoded". It's an additional byte sequence encoding atop.

mario
  • 144,265
  • 20
  • 237
  • 291