1

I am looking for a PCRE regular expression match pattern with which I can use in C language to extract the tail fragment of a string. My expected effect is to extract the string following string "en", which can be immediately followed by nothing, with or without slash(es) "/". If the first character following "en" is slash, ignore or trim it off before return the captured string. The input characters are pure ASCII in lower cases.

input-string        match   captured-string
---------------------------------------
english/japan       no
en                  yes
en/                 yes
en/japan            yes     japan
en//japan           yes     japan
en/japan/tokyo      yes     japan/tokyo
en//japan/tokyo     yes     japan/tokyo
en//                yes

Thank you in advance!

Masao Liu
  • 749
  • 2
  • 7
  • 16

2 Answers2

1

^en(?:/+(.+)|/?)$

^    #beginning of line
  en    #'en' literal
   (?:    #beginning of a not capturing group
     /+(.+)    #'/' one or more times + 'any' character one or more times (capturing group)
     |    # OR
     /?    #'/' zero or one time    
   )    #closing not capturing group
$    #end of line
polkduran
  • 2,533
  • 24
  • 34
  • Thank you for the intuitive explanation! I should have mentioned that occurrences of leading `/` in the captured string should be trimmed off. Given input `en//` or `en///`, `^en(?:/+(.+)|/?)$` captures `/` as the first captured group `$1`. The desired result is empty string instead. – Masao Liu Oct 23 '13 at 09:57
0
echo "en//japan/tokyo" | sed -rn 's;^en($|/+(.*));\2;p'
Yann Moisan
  • 8,161
  • 8
  • 47
  • 91
  • I have just tested `en/*(.*)` on http://www.freeformatter.com/regex-tester.html I seem to be unable to get the correct result from it. Input string `english/japan/tokyo` should not match, but it matches and returns the captured string `glish/japan/tokyo`. – Masao Liu Oct 21 '13 at 13:40
  • Thank you for the intuitive offer! – Masao Liu Oct 23 '13 at 08:56
  • Thank you for the intuitive offer! I should have mentioned that leading `/` in the captured string hopefully will be trimmed off, too. Given input `en//` or `en///`, the first captured group `$1` is `/`. The desired result is empty string instead. The framework I am using probably will trim off the leading slashes for me. Since I do not want to waste too much your time, I will check your answer as the correct one. However, if perfect versions are still available, they are much welcome of course. – Masao Liu Oct 23 '13 at 09:08
  • Oops! sed works but http://www.freeformatter.com/regex-tester.html doesn't. Perhaps they use different engines? – Masao Liu Oct 23 '13 at 09:19
  • Sorry! I must have been using my PC for too long. As a result, I messed your answer with polkduran's. Your answer is correct. – Masao Liu Oct 23 '13 at 09:51