7

I seem to be in a catch-22 with a small app I'm developing in PHP on Google App Engine using Quercus;

  1. I have a remote csv-file which I can download & store in a string
  2. To parse that string I'd ideally use str_getcsv, but Quercus doesn't have that function yet
  3. Quercus does seem to know fgetcsv, but that function expects a file handle which I don't have (and I can't make a new one as GAE doesn't allow files to be created)

Anyone got an idea of how to solve this without having to dismiss the built-in PHP csv-parser functions and write my own parser instead?

futtta
  • 5,917
  • 2
  • 21
  • 33
  • If Quercus has SplFileObject, you can use the approach given in http://stackoverflow.com/questions/2805427/how-to-extract-data-from-csv-file-in-php/2805486#2805486 – Gordon Aug 08 '11 at 07:56
  • @gordon: the problem is that GAE doesn't allow the creation of files. – futtta Aug 08 '11 at 08:10
  • Does it allow `new SplTempFileObject(-1)`? That would be in-memory then. – Gordon Aug 08 '11 at 08:15
  • 6
    Speaking as someone who's currently reading Catch 22, this isn't it. ;) – Nick Johnson Aug 08 '11 at 09:21
  • 1
    Not a bad idea, but no; "SplTempFileObject is an unknown class name". I also tried fopen("php://memory", "rw"), but that didn't work either ("java.lang.IllegalStateException: Cannot marsahl false to BinaryOutput"). – futtta Aug 08 '11 at 09:32
  • Hmm. Then I dont know. Last straw (but involves `tmpfile()`) would be to try the approach given in http://pear.php.net/package/PHP_Compat/docs/1.6.0a3/__filesource/fsource_PHP_Compat__PHP_Compat-1.6.0a3CompatFunctionstr_getcsv.php.html – Gordon Aug 08 '11 at 11:30
  • as tmpfile creates a file in the tmp directory which GAE doesn't allow, that doesn't work either, but thanks for the valuable pointers Gordon! – futtta Aug 08 '11 at 12:26
  • Maybe you can use data-uris with Quercus?: http://php.net/manual/en/wrappers.data.php – hakre Aug 13 '11 at 13:25
  • Wait, I don't understand why you can't just ''fopen()'' the remote stream, then ''fgetscsv()'' it - care to clarify? – Jonathan Chan Aug 13 '11 at 20:35
  • 1
    Parsing a csv file using regex is not that hard, I don't think it is a bad idea in your situation. – nobody Aug 14 '11 at 10:52
  • @jonathan; seems logical, but fopening a remote file doesn't work either, error msg is "com.caucho.quercus.QuercusModuleException: java.lang.NoClassDefFoundError: java.net.Socket is a restricted class." – futtta Aug 14 '11 at 16:52
  • @nobody: indeed, and regex isn't even needed. I'm doing this: $rd_array=explode("\r\n",$requestData); foreach ($rd_array as $request) { // do stuff with $request } works like a charm, but a native function is bound to be more error-prone & should provide better performance, so ... – futtta Aug 14 '11 at 16:55
  • @futtta I see - that's really weird that they'd disallow Sockets - how are you downloading the CSV file, though? Through the Quercus application, or manually? – Jonathan Chan Aug 14 '11 at 21:54
  • @jonathan: Curl is implemented in Quercus for that purpose. – futtta Aug 15 '11 at 06:55
  • @hakre: a great idea, but I just did some tests and data-uri's don't work. when trying to fopen I get "Warning: data: cannot be read [fopen]". too bad. – futtta Aug 15 '11 at 07:20
  • @futta: Hmm, I had hoped for it, I'm not fluent with Quercus. I think `str_getcsv` is really the missing link here (was so in earlier times in Zend PHP as well). Maybe you can file a feature request with Quercus? – hakre Aug 15 '11 at 07:26
  • Why was this question tagged Java? Removed the Java tag. –  Aug 16 '11 at 14:51
  • @loudsight: wasn't in there orignally, someone added it (because quercus is a java app that implements php I guess). anyway; fine with, fine without :) – futtta Aug 16 '11 at 15:29

5 Answers5

1

I think the simplest solution really is to write your own parser . it's a piece of cake anyway and will get you to learn more regex- it makes no sense that there is no csv string to array parser in PHP so it's totally justified to write your own. Just make sure it's not too slow ;)

Morg.
  • 697
  • 5
  • 7
0

You might be able to create a new stream wrapper using stream_wrapper_register.

Here's an example from the manual which reads global variables: http://www.php.net/manual/en/stream.streamwrapper.example-1.php

You could then use it like a normal file handle:

$csvStr = '...';
$fp = fopen('var://csvStr', 'r+');
while ($row = fgetcsv($fp)) {
    // ...
}
fclose($fp);
Long Ears
  • 4,886
  • 1
  • 21
  • 16
  • great idea long ears, but no cigar; copy/pasted the code of the example on php.net and the code dies when trying to stream_wrapper_register. too bad, looks like there's no solution (except for parsing the csv-in-a-string "manually"). – futtta Aug 21 '11 at 12:12
0

this shows a simple manual parser i wrote with example input with qualifed, non-qualified, escape feature. it can be used for the header and data rows and included an assoc array function to make your data into a kvp style array.

//example data
$fields = strparser('"first","second","third","fourth","fifth","sixth","seventh"');
print_r(makeAssocArray($fields, strparser('"asdf","bla\"1","bl,ah2","bl,ah\"3",123,34.234,"k;jsdfj ;alsjf;"')));


//do something like this
$fields = strparser(<csvfirstline>);
foreach ($lines as $line)
    $data = makeAssocArray($fields, strparser($line));


function strparser($string, $div = ",", $qual = "\"", $esc = "\\") {
    $buff = "";
    $data = array();
    $isQual = false; //the result will be a qualifier
    $inQual = false; //currently parseing inside qualifier

    //itereate through string each byte
    for ($i = 0; $i < strlen($string); $i++) {
        switch ($string[$i]) {
            case $esc:
                //add next byte to buffer and skip it
                $buff .= $string[$i+1];
                $i++;
                break;
            case $qual:
                //see if this is escaped qualifier
                if (!$inQual) {
                    $isQual = true;
                    $inQual = true;
                    break;
                } else {
                    $inQual = false; //done parseing qualifier
                    break;
                }
            case $div:
                if (!$inQual) {
                    $data[] = $buff;    //add value to data
                    $buff = "";         //reset buffer
                    break;
                }
            default:
                $buff .= $string[$i];
        }
    }
    //get last item as it doesnt have a divider
    $data[] = $buff;
    return $data;
}

function makeAssocArray($fields, $data) {
    foreach ($fields as $key => $field)
        $array[$field] = $data[$key];
    return $array;
}
duante
  • 174
  • 1
  • 9
  • I already tried that duante, doesn't work on GAE (cfr. http://stackoverflow.com/questions/6979114/parse-remote-csv-file-with-php-on-gae#comment-8447584) – futtta Sep 24 '11 at 11:27
0

if it can be dirty and quick. I would just use the http://php.net/manual/en/function.exec.php to pass it in and use sed and awk (http://shop.oreilly.com/product/9781565922259.do) to parse it. I know you wanted to use the php parser. I've tried before and failed simply because its not vocal about its errors. Hope this helps. Good luck.

nick
  • 73
  • 1
  • 3
  • 6
0

You might be able to use fopen with php://temp or php://memory (php.net) to get it to work. What you would do is open either php://temp or php://memory, write to it, then rewind it (php.net), and then pass it to fgetcsv. I didn't test this, but it might work.

Jason
  • 1,114
  • 1
  • 10
  • 24
  • @futtta in that case writing a parse would probably be your best bet. Doing an explode on the line, then doing regular expressions to properly deal with anything that has a comma inside the quotes. – Jason Nov 08 '11 at 20:27