0

In a web application, we allow users to add data by uploading CSV files.

Sometimes the fields in these CSV files contain special characters such as, for example, François.

When they contain these the upload often fails as the name field is a key field in the uploaded file and those fields return as empty.

We have set the form encoding using

<form accept-charset="UTF-8">

and the page itself is encoded UTF-8 with

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

but the uploaded file still "fails".

As a test we also tried using

utf8_encode();

on the fields in question but the same problem occurs.

If the same uploaded file is encoded as UTF-8 by the end user before upload then it works fine but most of our users are not very technical so we may as well talk an alien language as try to get them to set the encoding on the upload file so is there any other way we can handle this whereby we can force/convert all uploaded files to be in UTF-8?

bhttoan
  • 2,641
  • 5
  • 42
  • 71

1 Answers1

0

I had the very same problem not too long ago, where a customer uploads a text file (same thing) and in French and then used inside PHP files in order to be read/echo'd out.

What you need to do is this.

Change your headers to the following if you're not already doing so:

header ('Content-type: text/html; charset=iso8859-15');

The above is important in order to read it as iso8859-15 and not as UTF-8.

and then use the utf8_encode(); function as you already tried with, again.

as $file = utf8_encode ( $file );

Side note: This took me a while to get that working (it was rather tricky) and was quite glad that it worked out.

I have to state that the way that the file in question that is being read (for my client) is this way, should it be of any help:

$file = file_get_contents("$french_file", FILE_USE_INCLUDE_PATH);
Funk Forty Niner
  • 74,450
  • 15
  • 68
  • 141
  • Thanks for the answer - what if the user uploads a file as UTF-8? I assume then that would cause another issue as the file is already UTF-8 but the header is ISO and we are then trying to encode something which is already UTF-8? As I have no control over what encoding the file is when they upload I am trying to anticipate the alternate use case – bhttoan Jan 09 '18 at 21:08
  • @bhttoan welcome. It won't or shouldn't hurt it. If my memory serves me right, I did use the same codes for the English version of a similar file. Far as I can tell, it shouldn't hurt it. – Funk Forty Niner Jan 09 '18 at 21:09
  • The user must upload a text file with the encoding you tell them to use or you must allow them to tell you what the encoding is for that upload. (Their browser is not going to know or ask them so you telling or asking the browser isn't going to help.) If the user has a profile with your website, you could allow them to tell you their choice for future uploads via a profile page (similar to their preferred timezone, where that applies). [Or, you could avoid text files altogether, especially CSV.] – Tom Blodget Jan 10 '18 at 10:13