-1

For example I have this :

$string = 'PHP is a server side web programming language , Do you like PHP ?  , PHP is fantastic';

$array = array('html','css','javascript','ajax','html5','css3','jquery','PHP');

foreach($array as $ar){
   //Check if one of the $array values exists before the question mark '?' in the $string
}

I want to search before the question mark "?" in the $string only , so if the $array value "PHP" is not straight before the question mark "?" then nothing would happen as it's not exist , The PHP could be any of the other values in the $array so I don't know the length the value should be found at , I mean the word could be repeated and with different length.

i.e : $string = 'html .... , html is fantastic , Do you like html? , I love html'; , now the length of the word is bigger , and it could be bigger than that.

How to find the only straight "PHP" before the question mark and after "like" ['Do you like PHP ?'] what ever the length of the word is ?

Jan
  • 42,290
  • 8
  • 54
  • 79
Joe
  • 43
  • 7
  • 2
    Not to be picky, but there's also a space before the question mark. – Don't Panic Oct 25 '17 at 17:06
  • 1
    What happened with your [other question](https://stackoverflow.com/questions/46937148/how-to-search-the-end-of-a-string-for-a-text-exists-in-an-array)? – ishegg Oct 25 '17 at 17:11
  • Possible duplicate of [How to search the end of a string for a text exists in an array?](https://stackoverflow.com/questions/46937148/how-to-search-the-end-of-a-string-for-a-text-exists-in-an-array) – Jan Oct 25 '17 at 17:18
  • @ishegg: Great, haven't noticed. – Jan Oct 25 '17 at 17:19
  • @Jan , It's not duplicated , the two questions are totally different , the first is resolved but this is not . – Joe Oct 25 '17 at 17:28
  • @ishegg , maybe it's similar because of the way I explained it with , but the two questions are different – Joe Oct 25 '17 at 17:29
  • 1
    @Joe, if it's resolved, **accept the answer** that helped you the most. – ishegg Oct 25 '17 at 17:38
  • How about joining the array items into a single alternation regex. `$rgx = '(' . join('|',$array) . ')\s*\?';` then do a find all type of regex search, which will create an array of things found in the string. The advantage is that it only requires a single regex and a single pass. You'll have to pre-regex\_escape all the items in the array first. –  Oct 25 '17 at 18:41

1 Answers1

0

You could do what you want with regular expressions, but if you tokenize the text you'll have more flexibility:

<?php
$string = 'PHP is a server side web programming language , Do you like PHP?, Do you like Javascript ? What is Ajax?? Coding is fun.';
$find = ['html','css','javascript','ajax','html5','css3','jquery','php'];

// Convert to lowercase and add whitespace to punctuation
$tokenized_string = preg_replace("/([^a-zA-Z0-9'-_ ])/", ' \1 ', strtolower($string));

// Condense multiple sequential spaces into a single space
$tokenized_string = preg_replace('/ {2,}/', ' ', $tokenized_string);

// Tokenize the text into words
$words = explode(' ', $tokenized_string);

// Find search terms directly preceding a question mark token
$question_words = array_filter(
    array_intersect($words, $find),
    function($k) use ($words) {
        return @$words[$k+1] == '?';
    },
    ARRAY_FILTER_USE_KEY
);

// Output our matches
var_dump($question_words);

This creates a normalized array of tokens as $words, like:

array(30) {
  [0] =>
  string(3) "php"
  [1] =>
  string(2) "is"
  [2] =>
  string(1) "a"
  [3] =>
  string(6) "server"
  [4] =>
  string(4) "side"
  [5] =>
  string(3) "web"
  [6] =>
  string(11) "programming"
  [7] =>
  string(8) "language"
  [8] =>
  string(1) ","
  [9] =>
  string(2) "do"
  [10] =>
  string(3) "you"
  [11] =>
  string(4) "like"
  [12] =>
  string(3) "php"
  [13] =>
  string(1) "?"
  [14] =>
  string(1) ","
  [15] =>
  string(2) "do"
  [16] =>
  string(3) "you"
  [17] =>
  string(4) "like"
  [18] =>
  string(10) "javascript"
  [19] =>
  string(1) "?"
  [20] =>
  string(4) "what"
  [21] =>
  string(2) "is"
  [22] =>
  string(4) "ajax"
  [23] =>
  string(1) "?"
  [24] =>
  string(1) "?"
  [25] =>
  string(6) "coding"
  [26] =>
  string(2) "is"
  [27] =>
  string(3) "fun"
  [28] =>
  string(1) "."
  [29] =>
  string(0) ""
}

It returns an array of search terms found before a question mark, keyed by their position in the $words array:

array(3) {
  [12] =>
  string(3) "php"
  [18] =>
  string(10) "javascript"
  [22] =>
  string(4) "ajax"
}

This makes the assumption that you're not using search terms like node.js, which contain punctuation within them, although you could accommodate that fairly easily with this approach.

It also assumes you don't have any multiple word search terms like amazon s3. Instead of doing the array_intersect() you could iterate through question mark tokens with array_keys($words, '?') and check for your search terms in the tokens preceding it based on their word length.

Jeff Standen
  • 6,670
  • 1
  • 17
  • 18