0

I'm having issues parsing a csv file in php, using fopen() taking in API data.

My code works when I use a URL that displays the csv file in the browser as stated in 1) below. But I get random characters outputted from a URL that ends in format=csv as seen in 2) below.

1) Working URL: Returned expected values https://www.kimonolabs.com/api/csv/duo2mkw2?apikey=yjEl780lSQ8IcVHkItiHzzUZxd1wqSJv

2) Not Working URL: Returns random characters https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv

Here is my code: - using URL (2) above

<?php

 $f_pointer=fopen("https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/   last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv","r"); 


while(! feof($f_pointer)){
   $ar=fgetcsv($f_pointer);


echo $ar[1];


 echo "<br>";
   }
 ?>


Output: For URL mentioned in (2) above:

root@MorryServer:/# php testing.php

?IU?Q?JL?.?/Q?R??/)?J-.?))VH?/OM?K-NI?T0?P?*ͩT0204jzԴ?H???X???@ D??K


Correct Output: If I use URL Type as stated in (1)

root@MorryServer:/# php testing.php

PHP Notice: Undefined offset: 1 in /testing.php on line 24
jackpot
€2,893,210

Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
Sarah Boland
  • 87
  • 1
  • 12
  • The first URL returns data with what looks like a title as the first value. That is invalid CSV (or at least unconventional), so PHP's CSV functions will have a problem with that. The second URL seems to return valid CSV values. Instead of using `fopen` (which you shouldn't be using anyway; it is a low-level function that should only be used if you have to use it) you should use something like `str_getcsv`, which was made for stuff like this. – Sverri M. Olsen Jul 11 '15 at 10:57
  • Hi Sverri, the first URL is fine, I can remove headers, the issue is the weird output for 2). I require the code to work for the parsehub url – Sarah Boland Jul 11 '15 at 10:59
  • What do you get if you run this: `php -r "echo mb_internal_encoding();"` The second URL looks perfectly fine. It returns a file with mimetype `text/csv`, encoded with `UTF-8`. Slightly older versions of PHP defaulted to using `iso-8859-1`. Newer versions use `utf-8`. What is your PHP version? – Sverri M. Olsen Jul 11 '15 at 11:03
  • Well the returned csv is most likely UTF-8 to get the euro symbol, which isn't a valid ISO-8859-1 character – Mark Baker Jul 11 '15 at 11:05
  • Sorry I ran php -r "echo mb_internal_encoding(); on my terminal for the server or did you want me to add that into my php file? – Sarah Boland Jul 11 '15 at 11:08
  • Nah, the problem seems to be that your server is using a different encoding (iso-8859-1) than the returned data is using (UTF-8). It is like writing English using Cyrillic characters; the content is the same but the characters are all out of whack. If you have access to the server's configuration then you can [fix it there](http://stackoverflow.com/questions/9351694/setting-php-default-encoding-to-utf-8), or you can resort to [converting the encoding](http://php.net/manual/en/function.mb-convert-encoding.php) before using the data. – Sverri M. Olsen Jul 11 '15 at 11:20
  • Thanks, I had a look in my php5 folder and I see this, ; PHP's default character set is set . - Default is UTF-8 already :/ ; http://php.net/default-charset ;default_charset = "UTF-8" [iconv] ;iconv.input_encoding = ISO-8859-1 ;iconv.internal_encoding = ISO-8859-1 ;iconv.output_encoding = ISO-8859-1 ;mssql.charset = "ISO-8859-1" – Sarah Boland Jul 11 '15 at 11:27

1 Answers1

1

This is an encoding problem.

The given file contains UTF-8 chars. These are read by the fgetcsv function, which is binary safe. Line Endings are Unix-Format ("\n").

The output on the terminal is scrumbled. Looking at the headers sent, we see:

GET https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv --> 200 OK
Connection: close
Date: Sat, 11 Jul 2015 13:15:24 GMT
Server: nginx/1.6.2
Content-Encoding: gzip
Content-Length: 123
Content-Type: text/csv; charset=UTF-8
Last-Modified: Fri, 10 Jul 2015 11:43:49 GMT
Client-Date: Sat, 11 Jul 2015 13:15:23 GMT
Client-Peer: 107.170.197.156:443
Client-Response-Num: 1
Client-SSL-Cert-Issuer: /C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Domain Validation Secure Server CA
Client-SSL-Cert-Subject: /OU=Domain Control Validated/OU=PositiveSSL/CN=www.parsehub.com

Mind the Content-Encoding: gzip: fgetcsv working on an URL doesn't obviously handle gzip encosing. The scrumbled String is just the gzipped content of the "file".

Look at the gzip lib of PHP to first deflate that before parsing it. Proof:

srv:~ # lwp-download 'https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv' data
123 bytes received
srv:~ # file data
data: gzip compressed data, was "tcW80-EcI6Oj2TYPXI-47XwK.csv", from Unix, last modified: Fri Jul 10 11:43:48 2015, max compression
srv:~ # gzip -d < data
"title","jackpot"
"Lotto Results for Wednesday 08 July 2015","€2,893,210"

To get the proper output, minimal changes are need: Just add a stream wrapper:

<?php

        $f_pointer=fopen("compress.zlib://https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv","r");

        if ( $f_pointer === false )
                die ("invalid URL");

        $ar = array();
        while(! feof($f_pointer)){
                $ar[]=fgetcsv($f_pointer);
        }

        print_r($ar);

?>

Outputs:

Array
(
    [0] => Array
        (
            [0] => title
            [1] => jackpot
        )

    [1] => Array
        (
            [0] => Lotto Results for Wednesday 08 July 2015
            [1] => €2,893,210
        )

)
Axel Amthor
  • 10,980
  • 1
  • 25
  • 44