1

Here's the problem i need to post a .csv file from one server to another. I do this by reading the contents of the .csv file and sending that with curl as post data. This is working without problems.

But then when i try to parse the data and store it in a table in the database the trouble begins. I have all the variables in a array, if i print this array it displays correctly. But if i echo a value from that array i get all kinds of weird characters.

My best guess is it has something to do with the encoding of the csv file but i wouldnt have a clue how to fix that.

here's the function i use to parse the csv data:

public function parseCsv($data)
{
    $quote = '"';
    $newline = "\n";
    $seperator = ';';
    $dbQuote = $quote . $quote;

    // Clean up file
    $data = trim($data);
    $data = str_replace("\r\n", $newline, $data);
    $data = str_replace($dbQuote,'"', $data);
    $data = str_replace(',",', ',,', $data);
    $data .= $seperator; 

    $inquotes = false;
    $startPoint = $row = $cellNo = 0;

    for($i=0; $i<strlen($data); $i++) {
        $char = $data[$i];
        if ($char == $quote) {
            if ($inquotes) $inquotes = false;
            else $inquotes = true;
        }        

        if (($char == $seperator or $char == $newline) and !$inquotes) {
            $cell = substr($data,$startPoint,$i-$startPoint);
            $cell = str_replace($quote,'',$cell);
            $cell = str_replace('&quot;',$quote,$cell);

            $result[$row][$this->csvMap[$cellNo]] = $this->_parseValue($cellNo, $cell);
            ++$cellNo;
            $startPoint = $i + 1;
            if ($char == $newline) {
                $cellNo = 0;
                ++$row;
            }
        }
    }
    return $result;
}

any help is appreciated!

EDIT: Ok so after some more trial and error i found out its just the very first value of the first row that has some extra characters. If i echo that value everything i output after that gets messed up. So i tried to change the encoding now if i echo the value its all good but i have a new problem, its a string but i need a int:

echo $val; //output: 7655 but messes up everything outputted after it
$val = mb_convert_encoding($val, "UTF-8");    
echo $val // output: 7655
echo intval($val) //output: 0

EDIT: expected output:

7655Array ( [kenmerk] => ÿþ7655 [status] => 205 [status_date] => 1991-12-30 [dob] => 1936-09-04 ) succes

messed up output

7655牁慲੹ਨ††歛湥敭歲⁝㸽@㟾㘀㔀㔀਀††獛慴畴嵳㴠‾㈀㤀㔀਀††獛慴畴彳慤整⁝㸽 201ⴱ㄀㈀ⴀ30 †嬠潤嵢㴠‾㄀㤀㘀㘀-08〭㐀਀਩畳捣獥

i first echo the element 'kenmerk' after that i print the array as you can see in the array the element 'kenmerk' has some extra charcters..

converting the data to utf-8 like so: $data = mb_convert_encoding($data, "UTF-8");

eliminates the problem with messed up output and removes the 'ÿþ' (incorrectly-interpreted BOM?) but i still cant convert the values to a int

EDIT:

ok i sort of found a solution.. but as i have no idea why it works i'd appreciate any info

var_dump((int) $val); // output: 0
var_dump((int) strip_tags($val); // output: 7655
user458753
  • 147
  • 2
  • 3
  • 15
  • 4
    1) Why aren't you using `fgetcsv` to parse CSVs? 2) What kind of problems do you have? Show a "normal" array and a "messed up" value. – deceze Feb 16 '12 at 09:36
  • 1) i used fgetcsv but after running into problems and not knowing where it came from i wrote this function. 2) updated the question. – user458753 Feb 16 '12 at 10:17
  • 1
    Please specify ***how*** it "messes everything up". That's a relevant detail. I'd venture a spontaneous guess that you might have a BOM there. Search for that keyword. – deceze Feb 16 '12 at 10:32
  • by messing up i mean i get all kinds of chinese characters instead of normal text. – user458753 Feb 16 '12 at 10:34
  • Please **show an example** and what you would expect instead. That can help diagnose the problem. – deceze Feb 16 '12 at 10:35

1 Answers1

0

You need to remove ÿþ from 7655. intval() and int ($val = (int)$val;) will always output 0 when the first character is not a number. Ex. 765ÿþ5 will return 765, etc.

Regarding your first problem, I would also recommend you to read this answer. PHP messing with HTML Charset Encoding

I hope that it will give you more clarity about what you struggle with.

I will also build you striping process more stable, so it ex. match 7655 instead of ÿþ7655.

Community
  • 1
  • 1
Diblo Dk
  • 585
  • 10
  • 26