0

I'm trying to match all words in a line after (After this match) except those containing a number For example in a line I have :

After this match word word1 worldtwo word3 word4 wordfive 502 875 

I want to match only words without numbers the result should be :

word worldtwo wordfive

The number of words in the line could change

I tried After this match ([a-zA-Z]*) but it matched only one word

Please see here : http://www.rubular.com/r/HykbS2Eajk

I'm using cakephp with regex but I need to do it only with regex

amorino
  • 375
  • 1
  • 3
  • 16

5 Answers5

3

You can use this pattern:

(?:match|\G(?<!^)).*?(\b[a-zA-Z]+\b)

It's a variant of THIS "almost" general method... You can check it for more details...

Live DEMO

Community
  • 1
  • 1
Enissay
  • 4,969
  • 3
  • 29
  • 56
  • I took some code from your answer and I modified my php as this : `preg_match_all ("/(?:After this match|\G(?<!^)).*?(\b[a-zA-Z]+\b)/i", $content, $docteur); //1 print_r( $docteur[1][0].' '.$docteur[1][1].' '.$docteur[1][2].' '.$docteur[1][3].' '.$docteur[1][4].' '.$docteur[1][5]);` So it will take the 5 first words found in the line with my criteria (without numbers) – amorino Nov 27 '13 at 15:40
  • @Enissay what does the `\G` stands for? – Tafari Nov 27 '13 at 16:12
  • 1
    @Tafari: `\G` matches at the position where the previous match ended... You can read more baout it [`HERE`](http://www.regular-expressions.info/continue.html) – Enissay Nov 27 '13 at 16:13
  • Hello Enissay it's searching the hole document and not only in the line that contains "After this match" So if I have 2 times "After this match" it will take them twice I want to stop it only on that 1st line so it will not search the other lines after Any help please? – amorino Dec 05 '13 at 02:26
  • @Enissay Here is example of the new problem thank you http://www.rubular.com/r/FZjaBZUDXM – amorino Dec 05 '13 at 02:28
0

You can use word boundaries:

 (\b[a-zA-Z]+\b)

a word boundary is a zero-width limit between a word character and a non-word character. word characters are [a-zA-Z0-9_], since this class contains digits too, you can't have a word boundary between a letter and a digit.

To obtain the result you want after a precedent match (After this match in your example), you can use this pattern (in PHP, not rubular):

/(?>After this match|\G(?<!^)(?>\W*\S*[0-9]\S*)*)\W+\K\b[a-z]+\b/i
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • 1
    Use `+` instead of `*`! – devnull Nov 27 '13 at 11:28
  • Thanky you but it take only one word please see it here in action : http://www.rubular.com/r/kbRNdZVJdc – amorino Nov 27 '13 at 11:30
  • 1
    For this to work, you'll need to strip off `"After this select "` from the source string first. – techfoobar Nov 27 '13 at 11:32
  • Worked on Rubular but on php it took just one word Should I add some thing? `preg_match ("/(?>Demandé\s+par\s+:\s+Dr\.|\G(?<!^)(?>\W*\S*[0-9]\S*)*)\W+\K\b[a-z]+\b/iu", $content, $docteur); //1 print_r( $docteur);` Source : `Demandé par : Dr. Docteur Doctorant POP1` It match only Docteur – amorino Nov 27 '13 at 12:02
  • 1
    @amorino: use preg_match_all instead of preg_match. – Casimir et Hippolyte Nov 27 '13 at 12:05
  • @Casimir thank you the document is too long with preg_match_all it will match so many many words Is there not an other method please? Thanks – amorino Nov 27 '13 at 12:28
  • It should take till the end of the line not more – amorino Nov 27 '13 at 12:31
  • @amorino: If you want to stop before the end of the line replace all `\W` by `[^\w\n]` – Casimir et Hippolyte Nov 27 '13 at 12:44
  • @Casmir thank you I'm almost there I get : `Array ( [0] => Array ( [0] => word [1] => Wordtwo ) )` How could I get those [0][1][2].. in the array under [0] – amorino Nov 27 '13 at 13:32
0
After this select.*?\s([^\d\s]+)(?:\s|$)

We're first matching 'After this select', then reluctantly (as little as possible) of any character, then a space, then capturing anything that is not a digit and not a space — in other words, we're capturing words without digits — then a space or an end-of-string anchor, to make sure we don't capture parts of words.

If you match repeatedly, the string you're looking for will be in your captured groups.

SQB
  • 3,926
  • 2
  • 28
  • 49
  • @amorino Yes, it does. Apply this repeatedly to get all words. I assumed that in your example, you want to match 'word', 'worldtwo', 'wordfive'. – SQB Nov 27 '13 at 11:35
  • Yes I made in php preg_match ("/After this match.*?\s([^\d\s]+)(?:\s|$)/iu", $content, $docteur); //1 print_r( $docteur); But it matched only one word have I missed something SQB ? – amorino Nov 27 '13 at 11:44
0

Hmm you could do this using two regex patterns:

INPUT

After this match word word1 worldtwo word3 word4 wordfive 502 875

first to get all the characters after After this match:

Pattern

(?<=After this match )(.+?$)

OUTPUT

word word1 worldtwo word3 word4 wordfive 502 875

Then use the second pattern to get the words without numbers:

PATTERN

\b[^\d\s]+?\b

OUTPUT

word
worldtwo
wordfive

Tested it here:

gskinner.com/RegExr/

I'm trying to come up with one pattern version now, so if I'll manage to do this I'll edit my post : )

EDIT

Here is a one regex version:

(?:(?<!\s)After this match|\G).+?(\b[^\d\s]+?\b)

You get the matches in the group 1

OUTPUT

Match 1: After this match word Group 1: word

Match 2: word1 worldtwo Group 1: worldtwo

Match 3: word3 word4 wordfive Group 1: wordfive

Tafari
  • 2,639
  • 4
  • 20
  • 28
  • @amorino ok I was trying to do it for like an hour but didn't manage to find out how to put it into one regex, so I'm not sure it's possible at all to get the output at once. Anyway currently I have some work to do, so will try it again bit later, but does the current solution work for you? – Tafari Nov 27 '13 at 14:44
  • Thank you very much Tafari I took some code from your answer and I modified my php as this : `preg_match_all ("/(?:After this match|\G(?<!^)).*?(\b[a-zA-Z]+\b)/i", $content, $docteur); //1 print_r( $docteur[1][0].' '.$docteur[1][1].' '.$docteur[1][2].' '.$docteur[1][3].' '.$docteur[1][4].' '.$docteur[1][5]);` So it will take the 5 first words found in the line with my criteria (without numbers) I will use this until you find the one regex line ;) Best regards Amorino – amorino Nov 27 '13 at 15:42
  • @amorino added the one regex version : ) – Tafari Nov 28 '13 at 08:05
  • @amorino It has to work as **php regex patterns** are no different than others afaik. But have you used **groups**, to get the certain words? I think it would be something like: `preg_match(pattern, $str, $re); print_r($re[1]);` And you can test my **pattern** on the site, which you posted for Enissay: [www.rubular.com](http://www.rubular.com/r/rWZmyydyqQ) – Tafari Dec 05 '13 at 07:53
0

I took some code from answers and I modified my php as this :

preg_match_all ("/(?:After this match|\G(?<!^)).*?(\b[a-zA-Z]+\b)/i", $content, $docteur); //1
print_r( $docteur[1][0].' '.$docteur[1][1].' '.$docteur[1][2].' '.$docteur[1][3].' '.$docteur[1][4].' '.$docteur[1][5]);

So it will take the 5 first words found in the line with my criteria (without numbers)

amorino
  • 375
  • 1
  • 3
  • 16