1

I am importing a CSV fil**e into my database using **MySQL'S LOAD DATA INFILE command. The file is not necessarily UTF-8 encoded, I don't have any control over that so i must resort to pre/post processing. Both my database and HTML web pages enforce UTF-8 encoding. Since I do Load Data infile, I have to post process this. So I extract the information out of the database and apply my post processing filters using htmlentities.

   foreach($records as $r)
       $updates[] = htmlentities($r["column"], ENT_COMPAT, 'UTF-8');

Then I update the DB table again.

The columns go into the table just as they should before the post processing but after that, the columns go blank which means htmlentities returned a blank and that is a valid return value for HTML.

Specifically a candidate value is:

"PJ Weatherproof 32 ®"

Any idea why ?

Parijat Kalia
  • 4,929
  • 10
  • 50
  • 77
  • Not an answer to your question, but why are you using `htmlentities()` in the first place? There should be no need for it. – Pekka Jul 11 '13 at 19:31
  • because should I not need to escape some of these htmlentities ? – Parijat Kalia Jul 11 '13 at 19:35
  • Nope. If everything is properly UTF-8 encoded, there shouldn't be a need for HTML entities. – Pekka Jul 11 '13 at 20:01
  • it isn't necessarily properly encoded in the CSV file, which is the reason why I must post process...no control over the file – Parijat Kalia Jul 11 '13 at 20:14
  • But if data isn't properly encoded in the CSV file, I don't see how `htmlentities()` can do anything to fix it. – Pekka Jul 12 '13 at 07:36

1 Answers1

2

The "UTF-8" parameter in your call promises htmlentities()that the incoming data will be UTF-8. When the data isn't, which will cause the function to return a blank value.

You'll need to try and sniff the encoding, which is an unreliable process and will work well only when you have a very limited set of possible encodings. See e.g. this answer.

Either way, you can get rid of the htmlentities() call - it will do nothing to help the situation, just uselessly add HTML entities where they aren't needed.

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088