0

So this one let me doubt my sanity - I'm currently working on a simple script that should read URLs from a csv file and than perform a regex but my pattern kept on failing. Upon close inspection I noticed something strange: The strings that fgetcsv returns showing a completely wrong length when displayed with var_dump(). Any idea what is going on here and how to sanitize this string?

Sample-Code:

<?php
$read = fopen("input.csv","r");
while($data = fgetcsv($read,null,",","\"","\\")){
    var_dump($data[0]);
    echo mb_detect_encoding($data[0]);
    echo "\n";
}
?>

And the response looks like this:

string(25) "/index.html"
ASCII
string(23) "/login.html"
ASCII
string(15) "/insta/"
ASCII

The strings are the same that are in my csv-file, but as you see the string length reported isn't right. What is going on here? Are there invisible characters I'm not seeing? Is it some strange encoding problem? How can I fix this?

Gedi Nixan
  • 53
  • 5
  • 1
    Split the string, iterate over each character and use https://www.php.net/manual/en/function.ord.php. This will show you what you have. Alternatively you could try `trim`. – user3783243 Jan 05 '23 at 18:40
  • Does this answer your question? [Remove BOM () from imported .csv file](https://stackoverflow.com/questions/32184933/remove-bom-%c3%af-from-imported-csv-file). In fact, the string `"/index.html"` is `\u0022\uFEFF\u002F\u0069\u006E\u0064\u0065\u0078\u002E\u0068\u0074\u006D\u006C\u0022` (incl. double quotes; starts with - `"` (U+0022, *Quotation Mark*) - `` (U+FEFF, *Zero Width No-Break Space*) - `/` (U+002F, *Solidus*)). – JosefZ Jan 05 '23 at 20:38
  • @user3783243 Thank you I tried that. For "/index.html" I got the following output: `239 187 191 47 0 105 0 110 0 100 0 101 0 120 0 46 0 104 0 116 0 109 0 108 0` I guess the zeros are a/the problem? But what are they? Where do the come from and how do I get rid of them? – Gedi Nixan Jan 06 '23 at 08:44
  • I think that may be UTF-16 encoding. Google "convert from utf-16 to utf-8" – Barmar Jan 06 '23 at 15:49
  • Thank you - i've tried it and thrown everything I found at it mb_convert_encoding(), iconv(),... but didn't have any luck yet – Gedi Nixan Jan 08 '23 at 16:43

0 Answers0