1

I have a CSV file that I need to change the encoding of. I want to be able to do this using PHP. I know there is the mb_convert_encoding function but that is only for strings.

Is there a function I can use to change the encoding of an entire csv file?

Cheers,

Updates: Turns out the solution to my problem would be to remove the BOM from my file.

I am using @treehouse code below and modified it to replace bom but it just fills the temp file forever whats wrong?

$sourcePath = 'EstablishmentExport.csv';
$tempPath = $sourcePath . 'temp';
$source = fopen($sourcePath, 'r');
$target = fopen($tempPath, 'w');
while(!feof($source)) {
    $line = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $source);
    fwrite($target, $line);
}
fclose($source);
fclose($target);
unlink($sourcePath);
rename($tempPath, $sourcePath);

4 Answers4

1
file_put_contents('the/file/path.csv', mb_convert_encoding(file_get_contents('the/file/path.csv'), 'ENCODING'));

Just fill in the correct file path and the desired type of encoding.

Edit: Since the source file is apparently huge, you'll have to load the file line by line, which can be accomplished using fopen. However you need to write the newly encoded strings to a temporary file first, which you then rename to the original filename after deleting the original file:

$sourcePath = 'path/to/file.csv';
$tempPath = $sourcePath . 'temp';
$source = fopen($sourcePath, 'r');
$target = fopen($tempPath, 'w');
while(!feof($source)) {
    $line = mb_convert_encoding(fgets($source), 'ENCODING');
    fwrite($target, $line);
}
fclose($source);
fclose($target);
unlink($sourcePath);
rename($tempPath, $sourcePath);
ksbg
  • 3,214
  • 1
  • 22
  • 35
  • The file is 300MB would that be too big to load to memory? Also when I put it into a string how does it keep the line breaks? –  Jul 01 '15 at 14:07
  • you don't have to worry about line breaks, however the size is a valid concern. I'll come up with a solution. – ksbg Jul 01 '15 at 14:11
  • I'd use `tmpfile` or `php://temp` or such for the temporary file... Also supply the *from* encoding for `mb_convert_encoding`... Apart from that, +1. – deceze Jul 01 '15 at 14:37
  • Hi, I have realised that instead of changing the encoding I can just remove the BOM to fix my file. I have modified the code but uinstead what happens is that it just fills the temp file forever. I have added the ammended code into the question. –  Jul 01 '15 at 16:33
  • That is the case because right before the regular expression, you have to write `$line=fgets($source);`, in order to move the pointer to the next line. Right now your loop always stays in the same line. Also `preg_replace` must have `$line` as a third argument, ,not `$source`. – ksbg Jul 02 '15 at 07:25
0

Load the contents of the file into a string with file_get_contents(); then use mb_convert_encoding() on it and then store the converted string with file_put_contents().

Ioannis Loukeris
  • 300
  • 2
  • 10
0

Just read the entire file into a string with file_get_contents , then run it trough mb_convert_encoding function, and save again. That is all there is to it.

In case your file is huge, and it isn't practical to load it into memory at once, do it line by line. (look up fopen, fgets, etc)

Erwin Moller
  • 2,375
  • 14
  • 22
  • File is encoded in UTF-16LE so fgets() is a no go as it will break the file. Do you have any suggestions? –  Jul 03 '15 at 14:32
0

Since you are dealing with a very big file I suggest leaving this task to the operating system by the means of exec, shell_exec or bactick operator.

See here about methods on how to do just that http://mindspill.net/computing/linux-notes/determine-and-change-file-character-encoding/ Best way to convert text files between character sets?

Example: shell_exec ( 'iconv -f utf-16le -t utf-8 1.csv > 2.csv' );

Community
  • 1
  • 1
Alex Andrei
  • 7,315
  • 3
  • 28
  • 42
  • This worked :) Cheers man. I edited your answer to include my code so if anyone sees it they can get an idea. –  Jul 03 '15 at 16:05