-1

I have this string:

$string = "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?";

I'm splitting this string by space and some operators(=,<,>,!=,>=,<=,<>) using this code:

$split = preg_split('/\s+|(,|[<>!]?=|<>?|>)/', $string, null, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

For now the result of this splitting is this array:

Array
(
    [0] => My
    [1] => name
    [2] => is
    [3] => Emma
    [4] => and
    [5] => i
    [6] => have
    [7] => a
    [8] => dillemma
    [9] => ,
    [10] => what's
    [11] => the
    [12] => distance
    [13] => between
    [14] => 'New
    [15] => York'
    [16] => and
    [17] => 'Athene'
    [18] => ?
)

Now the only problem that i have is that i want the white spaces between '' to not be splitted but to remove '' after split, in this example above you can see 'New York' is splitted into:

[14] => 'New
[15] => York'

My desired outcome is:

[14] => New York

And also 'Athene', i want it to be:

[16] => Athene

So basicly the above array should look like this:

Array
(
    [0] => My
    [1] => name
    [2] => is
    [3] => Emma
    [4] => and
    [5] => i
    [6] => have
    [7] => a
    [8] => dillemma
    [9] => ,
    [10] => what's
    [11] => the
    [12] => distance
    [13] => between
    [14] => New York
    [15] => and
    [16] => Athena
    [17] => ?
)

And yes the distance between those two cities is 4,925 miles or 7925 kilometers :D

Thank you! :D

emma
  • 761
  • 5
  • 20
  • 1
    Possible duplicate of [PHP explode the string, but treat words in quotes as a single word](https://stackoverflow.com/questions/2202435/php-explode-the-string-but-treat-words-in-quotes-as-a-single-word) – iainn Apr 17 '18 at 10:48
  • (Specifically [this answer](https://stackoverflow.com/a/6609509/4680018)) – iainn Apr 17 '18 at 10:50
  • @iainn that answer doesn't keep my operators in array :( for example if i have `age=21` (note that there is no white space between age = and 21)i want it to be splitted into `['age', '=', '21']` :P – emma Apr 17 '18 at 10:54
  • Please update your question so that it is clear for those who may like to offer you optimized solutions and for researchers that have a similar task. – mickmackusa Apr 28 '18 at 04:09

2 Answers2

3

Regular Expression

(?:\'([^\']*[\'s]?)\'|\"([^\"]*)\")|[^\s,<>=!]+|(?:,|[<>!]?=|<>?|>)

You can see the matches here: https://regex101.com/r/LkHnHt/3

PHP Code

$text = "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?";
preg_match_all('/(?:\'([^\']*[\'s]?)\'|\"([^\"]*)\")|[^\s,<>=!]+|(?:,|[<>!]?=|<>?|>)/', $text, $matches);
foreach (array_filter($matches[1]) as $k => $v)
    $matches[0][$k] = $v;

Results

Array
(
    [0] => My
    [1] => name
    [2] => is
    [3] => Emma
    [4] => and
    [5] => i
    [6] => have
    [7] => a
    [8] => dillemma
    [9] => ,
    [10] => what's
    [11] => the
    [12] => distance
    [13] => between
    [14] => New York pop
    [15] => and
    [16] => Athene
    [17] => ?
)

Array
(
    [0] => age
    [1] => <
    [2] => 21
    [3] => ,
    [4] => length
    [5] => >
    [6] => 10
    [7] => ,
    [8] => height
    [9] => <>
    [10] => 10
    [11] => ,
    [12] => width
    [13] => !=
    [14] => 100
    [15] => ,
    [16] => name
    [17] => =
    [18] => Emma Einarsson
    [19] => or
    [20] => it
    [21] => can
    [22] => be
    [23] => words
    [24] => time
    [25] => >=
    [26] => 10
    [27] => ,
    [28] => clouds
    [29] => <=
    [30] => 4
)

Pay attention, all the data that capture saved in the array $matches[0]

Almog
  • 220
  • 1
  • 10
  • Hey @Almog! :D The problem now is that i'm getting all the other keys empty `Array ( [0] => [1] => [2] => [3] => [4] => [5] => [6] => [7] => [8] => [9] => [10] => [11] => [12] => [13] => [14] => [15] => [16] => New York [17] => [18] => [19] => Athena [20] => )` X_X Can you please help me figure out what i'm doing wrong? :-s – emma Apr 17 '18 at 11:19
  • 1
    `$re = '/(?:\'([^\'|\'s]*)\')|(?:\"([^\"]*)\")|[^\s,]*|(?:,|[<>!]?=|<>?|>)/';preg_match_all($re, $string, $matches, PREG_SET_ORDER, 0); var_dump($matches);` @emma's regex – Hammurabi Apr 17 '18 at 11:27
  • Hei @Hammurabi! the problem now is that it puts both 'New York' and New York without quotes in that array :( – emma Apr 17 '18 at 11:31
  • Hey @Almog! :D So now it works with that string but look if i try to use this code on this other string `age > 21 and my state is 'New York' while my name = 'Emma Einarsson'` my name is not acting like 'New York' and breaks in two keys `[12] => 'Emma` and `[13] => Einarsson'`...i think my name hates me X_X – emma Apr 17 '18 at 12:15
  • @emma It probably because of the different regex, can you give me a string of all cases and what it needs to give you as an example of that? so I can over in all cases in once :P – Almog Apr 17 '18 at 12:22
  • Yes :D Here: `$string = "age < 21, length > 10, height <> 10, width != 100, name = 'Emma Einarsson' or it can be words time >= 10, clouds <= 4"` but also i need it to work if there is no space between operators and commas like this `$string = "age<21,length>10,height<>10,width!=100,name='Emma Einarsson' or it can be words time>=10,clouds<=4"` like this :D – emma Apr 17 '18 at 12:34
  • Hey @Almog! :D Quick question :D i want to add +,-,[,] to this split, can you please show me how to do that? :-s i've tried some things but it seems like i'm not smart enough for regular expressions X_X – emma Apr 19 '18 at 14:50
  • And also *,/ (basicly all comparison operators and all arithmetic operators with or without space before/between them :D – emma Apr 19 '18 at 15:05
  • (?:\'([^\']*[\'s]?)\'|\"([^\"]*)\")|[^\s,<+\[\]\-\*\/>=!]+|(?:,|[<>!]|<>?|>|[+-\[\]*\/]) – Almog Apr 21 '18 at 10:18
0

If I understand the question requirements (after reading the question and many comments), the only tricky bit is the preserving of the single-quoted substrings.

You want to isolate:

  1. Single quote wrapped substrings that may contain spaces.
  2. Words that may contain an apostrophe (single quote)
  3. Numbers
  4. Five specific operators: <, >!,=,?`

Pattern: ~\B'\K(?:[^']+)|\b[a-z']+\b|\d+|[<>!=?]+~i

Code with battery of tests (Demo)

$strings = [
    "age<21,length>10,height<>10,width!=100,name='Emma Einarsson' or it can be words time>=10,clouds<=4",
    "age < 21, length > 10, height <> 10, width != 100, name = 'Emma Einarsson' or it can be words time >= 10, clouds <= 4",
    "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?",
    "'New York' and London at the start and end  with Paris and 'Los Angeles'"
];

foreach ($strings as $string) {
    var_export(preg_match_all("~\B'\K(?:[^']+)|\b[a-z']+\b|\d+|[<>!=?]+~i", $string, $out) ? $out[0] : 'fail');
    echo "\n";
}

Pattern Demo

Pattern Breakdown:

~                 #start of pattern delimiter
\B'\K(?:[^']+)    #match a single-quote not preceded by [a-zA-Z0-9_], then restart the fullstring match using (\K), then match one or more non-single quote characters
|                 #OR
\b[a-z']+\b       #match one or more letters and apostrophes 
|                 #OR
\d+               #match one or more digits
|                 #OR
[<>!=?]+          #match one or more of your listed operators/symbols
~                 #end of pattern delimiter
i                 #pattern modifier - make whole pattern case-insensitive

According to your sample input strings, you can technically remove the two \b (word boundary markers) from my pattern to improve pattern efficiency, but I left them in for maximum accuracy.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • @emma this performs more efficiently than your accepted answer, but I would like you to verify that it works appropriately with your project data. The trouble with seeing unrealistic sample data is that we may write unnecessary components into the pattern. In the future please always offer very realistic input data (just obfuscate/redact any private data where needed). With a better understanding of the variability of your real data, I may be able to improve my answer. – mickmackusa Apr 28 '18 at 04:51