3

Possible Duplicate:
Split string by delimiter, but not if it is escaped

I have a string generated form ibm informix database which is separated by pipe | characters and there are some data errors, which means there are backslash + pipe inside the data. I want to split these strings only from the pipe sign, not from backslash + pipe \| or other signs with the pipe.

This is my code, but it works only for the pipe character:

foreach(glob("ssbstat.unl") as $file)
{ 
    $c=0;       
    if(($load = fopen($file, "r")) !== false)
    { 
        $line = fgets($load);           
        $count= count(explode('|', $line));
        echo $fm= str_repeat('%[^|]|', $count)."%s\n";      

        do
        {
            echo $line;
            print_r($line);
            if($c++>10) break;
        } while ($line = fscanf($load, $fm));
    }
}

Can anyone help me do this?

Community
  • 1
  • 1
lankitha
  • 283
  • 7
  • 15

3 Answers3

3

Do it like this:

<?php
$line = preg_replace("/([^\\\])\|/", "$1 |", "Hi \|error\| man|ok man|perfect man");
print_r(preg_split('/[^\\\]\|/', $line));

Will output:

Array ( [0] => "Hi \|error\| man" [1] => "ok man" [2] => "perfect man" )

Testet!

Edit: Like Maerlyn said, this is also possible:

<?php
$line = "Hi \|error\| man|ok man|perfect man";
print_r(preg_split('~\\\\.(*SKIP)(*FAIL)|\|~s', $line));
noob
  • 8,982
  • 4
  • 37
  • 65
1

You can do this with preg_split. This piece [^\\\\] specifies that pipes with backslashes should be ignored (the four backslashes are required for proper escaping. You can add any other character you want to ignore inside the [].

print_r(preg_split('/(?<![\\\\])\|/', 'This\|is a|test|string'));
Michael Mior
  • 28,107
  • 9
  • 89
  • 113
  • This won't work! a string like `'This\|is a|test|string'` will return this: `Array ( [0] => "This\|is " [1] => "tes" [2] => "string" )` because you say any char before `|` which isn't a backslash will be a part of the split function too, so it is away. That's why you should use preg_replace in before (like I have answerd) – noob Dec 22 '11 at 13:57
  • Good catch. I really should have been using a negative lookbehind. `preg_replace` isn't necessary. Answer updated. – Michael Mior Dec 22 '11 at 14:13
-1

Replace backslah + pipesign with a placeholder, then explode by pipesign, then replace back placeholder with backslah + pipesign

Massimiliano Arione
  • 2,422
  • 19
  • 40
  • What kind of placeholder do you suggest? It would have to be something that couldn't possibly appear in the data or your last replacement may corrupt it. – Wesley Murch Dec 22 '11 at 13:37
  • anything that is unlikely present in your string. Something like "{[%my_great_placeholder%]}" or so – Massimiliano Arione Dec 22 '11 at 13:39
  • i have done it but there are about 100000 lines. it is taking hell of a time to so this. i want a short method. can this regular expression be developed??? – lankitha Dec 22 '11 at 13:39
  • @MassimilianoArione: It would *probably* work, but the "probably" is what keeps it from being a good idea; one that you would actually feel safe using. Unless you can be 100% sure what the data contains (or does not contain), there's no possible placeholder that you could use safely. Regular expressions are the solution for this kind of thing. – Wesley Murch Dec 22 '11 at 13:41
  • regexp is not faster than simple str_replace, indeed it's slower. – Massimiliano Arione Dec 22 '11 at 13:42
  • if there is a way to edit this regular expression. it can be done. only to select pipe sign. not other signs with pipe – lankitha Dec 22 '11 at 13:43
  • @lankitha: What does "other signs with pipe" mean? Each character is of course going to be next to another one, whether it's a space, newline, tan, backslash, etc. You need to be more explicit, maybe show some sample data. – Wesley Murch Dec 22 '11 at 13:44
  • @Madmartigan probability is just a matter of betting. I can bet whatever you want that your text is not containing the string "{[%#my_super_great_flicking_placeholder#%]}" – Massimiliano Arione Dec 22 '11 at 13:45
  • But guess what, now this page contains that very text. If I were running your idea on this page (of course, without *kowing* what the page contained) it would fail. Gambling is for fools! :) – Wesley Murch Dec 22 '11 at 13:47
  • I think you're completely missing my point... but I'll retire my argument. – Wesley Murch Dec 22 '11 at 13:51
  • No, I see it. But I just think you can afford a good probability just with a good choice of text. Ehi, you can also change it to something dynamic, maybe including the current date in it, or even a sha1. – Massimiliano Arione Dec 22 '11 at 14:04