2

Possible Duplicate:
PHP explode the string, but treat words in quotes as a single word.

i have a quoted string with quoted text. Can anyone give me the regex to split this up.

this has a \\\'quoted sentence\\\' inside

the quotes may also be single quotes. Im using preg_match_all.

right now this

preg_match_all('/\\\\"(?:\\\\.|[^\\\\"])*\\\\"|\S+/', $search_terms, $search_term_set);

Array
(
    [0] => Array
        (
            [0] => this
            [1] => has
            [2] => a
            [3] => \\\"quoted
            [4] => sentence\\\"
            [5] => inside
        )

)

i would like this output

Array
(
    [0] => Array
        (
            [0] => this
            [1] => has
            [2] => a
            [3] => \\\"quoted sentence\\\"
            [4] => inside
        )

)

This is NOT a duplicate of this question. PHP explode the string, but treat words in quotes as a single word

UPDATE:

Ive removed the mysql_real_escape_string. What regex do i need now Im just using magic quotes.

Community
  • 1
  • 1
madphp
  • 1,716
  • 5
  • 31
  • 72
  • You should run the regex on the string *before* using `mysql_real_escape_string`. – gen_Eric Jun 10 '11 at 20:36
  • 1
    yeah, i thought about doing that. Using it on each array value. But I guess I thought it would be better to do it just the once before the regex. I will keep that as a Plan B. – madphp Jun 10 '11 at 20:40

3 Answers3

1

I'm thinking you might want to use strpos and substrin this case.

This is very sloppy, but hopefully you get the general idea at least.

$string = "This has a 'quoted sentence' in it";




   // get the string position of every ' " and space
    $n_string = $string;  //reset n_string
    while ($pos = strpos("'", $n_string)) {
      $single_pos_arr[] = $pos;
      $n_string = substr($n_string, $pos);
    }
    $n_string = $string;  //reset n_string
    while ($pos = strpos('"', $n_string)) {
      $double_pos_arr[] = $pos;
      $n_string = substr($n_string, $pos);
    }
    $n_string = $string;  //reset n_string
    while ($pos = strpos(" ", $n_string)) {
      $space_pos_arr[] = $pos;
      $n_string = substr($n_string, $pos);
    }

Once you have the positions, you can write a simple algorithm to finish the job.

Brian Patterson
  • 1,615
  • 2
  • 15
  • 31
  • Very nice parser, aside from the atrocious coding standard :) - I was going to recommend writing a parser but they're a bit verbose and I remembered I had a regex that actually did this. – Halcyon Jun 10 '11 at 21:39
0

Why are there slashes in your input string?

Use stripslashes to get rid of them.

Then either write your own tokenizer or use this regex:

preg_match_all("/(\"[^\"]+\")|([^\s]+)/", $input, $matches)

Halcyon
  • 57,230
  • 10
  • 89
  • 128
  • they are the output of mysql_real_escape_string() used to prevent SQL injects. – madphp Jun 10 '11 at 20:26
  • 1
    @madphp: You should run the regex on the string *before* using `mysql_real_escape_string`. – gen_Eric Jun 10 '11 at 20:36
  • This is true, mysql_real_escape_string should be last thing you do – Brian Patterson Jun 10 '11 at 20:37
  • just want to point out that the string has been escaped TWICE. either with mysql_real_escape_string or another function like addslashes or may be you are using magic quotes in your version of php, but in any case, this presents a bit of a -escaping redundancy-. just something to keep in mind when you are debugging – hndcrftd Jun 10 '11 at 20:41
  • ok. thanks for pointing that out to me, how ever using magic quotes along with mysql_real_escape_string, im trying to prevent multi-byte character encoding. See http://shiflett.org/blog/2006/jan/addslashes-versus-mysql-real-escape-string if im wrong to do this, please let me know. I assumed magic quotes is similar to add_slashes, in that it doesnt see multi-byte charcter encoding. – madphp Jun 10 '11 at 20:54
  • first off, there's no harm in doing that other than it's not very efficient and not very debug-friendly. You just have to remember to un-escape twice. Next, you are correct in assuming that magic quotes is not multi-byte safe, however mysql_real_escape_string is multi-byte safe. So, instead of adding one on top of the other, you can just use mysql_real_escape_string by itself. I would avoid using both magic quotes and addslashes alltogether, due to the fact that they may -generate- multi-byte characters from the malformed strings, and just apply mysql_real_escape_string before mysql query runs – hndcrftd Jun 10 '11 at 22:59
0

Too long for a comment, even though it's actually a comment.

I don't understand how it's not a duplicate, using the principle from that link and replace quotes with triple blackslashed quotes:

$text = "this has a \\\\\'quoted sentence\\\\\' inside and then \\\\\'some more\\\\\' stuff";
print $text; //check input
$pattern = "/\\\{3}'(?:[^\'])*\\\{3}'|\S+/";
preg_match_all($pattern, $text, $matches);
print_r($matches);

and you get what you need. It's pretty much 100% copy of the link you posted with the only change being exactly what the guy suggested to do if you wanted to change the delimiters.

Edit: Here's my output:

Array
(
    [0] => Array
        (
            [0] => this
            [1] => has
            [2] => a
            [3] => \\\'quoted sentence\\\'
            [4] => inside
            [5] => and
            [6] => then
            [7] => \\\'some more\\\'
            [8] => stuff
        )

)

Edit2: Are you checking for single or double quotes after 3 slashes (your input and output array doesn't match if all you're doing is matching) or are you changing single quotes after three slashes in input to triple slash double quotes in output? If all you're doing is matching just change the two single quotes in patter to escaped double quotes or wrap pattern in single quotes so you don't have to escape double quotes.

NorthGuard
  • 953
  • 1
  • 7
  • 21
  • Im not saying it isnt a duplicate. I have magic quotes switched on, but in this script I want to double up with mysql_real_escape_string as added protection. Again, if im wrong to do this, please let me know. – madphp Jun 10 '11 at 21:31
  • You have 'NOT a duplicate' with 'NOT' in all caps :p But anyways, I don't have magic quotes on so I'm not sure what that does to your delimited single quotes. All I know is if the input (that's echo'd) looks like `this has a \\\"quoted sentence\\\" inside` then the pattern `'/\\\{3}"(?:[^\"])*\\\{3}"|\S+/'` will get you what you want. Note that pattern is different from my main post, it's for a double quote after 3 slashes. – NorthGuard Jun 10 '11 at 21:43
  • Ive decided to stop using mysql_real_escape_string for now. What is thr regex for single backslashes? – madphp Jun 10 '11 at 22:37