You could do what you want with regular expressions, but if you tokenize the text you'll have more flexibility:
<?php
$string = 'PHP is a server side web programming language , Do you like PHP?, Do you like Javascript ? What is Ajax?? Coding is fun.';
$find = ['html','css','javascript','ajax','html5','css3','jquery','php'];
// Convert to lowercase and add whitespace to punctuation
$tokenized_string = preg_replace("/([^a-zA-Z0-9'-_ ])/", ' \1 ', strtolower($string));
// Condense multiple sequential spaces into a single space
$tokenized_string = preg_replace('/ {2,}/', ' ', $tokenized_string);
// Tokenize the text into words
$words = explode(' ', $tokenized_string);
// Find search terms directly preceding a question mark token
$question_words = array_filter(
array_intersect($words, $find),
function($k) use ($words) {
return @$words[$k+1] == '?';
},
ARRAY_FILTER_USE_KEY
);
// Output our matches
var_dump($question_words);
This creates a normalized array of tokens as $words
, like:
array(30) {
[0] =>
string(3) "php"
[1] =>
string(2) "is"
[2] =>
string(1) "a"
[3] =>
string(6) "server"
[4] =>
string(4) "side"
[5] =>
string(3) "web"
[6] =>
string(11) "programming"
[7] =>
string(8) "language"
[8] =>
string(1) ","
[9] =>
string(2) "do"
[10] =>
string(3) "you"
[11] =>
string(4) "like"
[12] =>
string(3) "php"
[13] =>
string(1) "?"
[14] =>
string(1) ","
[15] =>
string(2) "do"
[16] =>
string(3) "you"
[17] =>
string(4) "like"
[18] =>
string(10) "javascript"
[19] =>
string(1) "?"
[20] =>
string(4) "what"
[21] =>
string(2) "is"
[22] =>
string(4) "ajax"
[23] =>
string(1) "?"
[24] =>
string(1) "?"
[25] =>
string(6) "coding"
[26] =>
string(2) "is"
[27] =>
string(3) "fun"
[28] =>
string(1) "."
[29] =>
string(0) ""
}
It returns an array of search terms found before a question mark, keyed by their position in the $words
array:
array(3) {
[12] =>
string(3) "php"
[18] =>
string(10) "javascript"
[22] =>
string(4) "ajax"
}
This makes the assumption that you're not using search terms like node.js
, which contain punctuation within them, although you could accommodate that fairly easily with this approach.
It also assumes you don't have any multiple word search terms like amazon s3
. Instead of doing the array_intersect()
you could iterate through question mark tokens with array_keys($words, '?')
and check for your search terms in the tokens preceding it based on their word length.