-2

I need a regex pattern to match any text that comes between <a href="https://website.com">Health & Beauty</a> that may or may not include a space and/or special character "&" but should not exceed the character limit of 10. In said case, I would want to extract:

Beauty & Fashion

The following is a regix code to extract anchor text:

(<[a|A][^>]*>|)

But I want to limit the character to 1 to 10 ? Is that possble?

2 Answers2

1

For PCRE:

https://regex101.com/r/GJSlZl/1

For JS:

https://regex101.com/r/FIdlyU/1

The solution depends on the regex flavor:

  • js: (?<=<a[^>]+>)([\w &]{1,10})(?=<\/a>)
  • pcre: <a[^>]+>\K([\w &]{1,10})(?=<\/a>)
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
  • Thanks for your Help! I really appreciate it.. Is it possible that the it doesn't extract anything that contains any number like January 2019? – Sabahat Ali Sep 06 '19 at 00:18
0

My guess is that you're looking to find some expression similar to,

(?<=&|>)([^&\r\n]{0,10}(?=&|<\/a>))*

which you might want to add more boundaries on the left side,

(?<=&|>) 

Test

$re = '/(?<=&|>)([^&\r\n]{0,10}(?=&|<\/a>))*/s';
$str = '<a>Health & Beauty</a>
<a href="https://website.com">Health & Beauty</a>
<a href="https://website.com">Health & Beauty 1 & Health & Beauty 1 </a>
<a>Health & Beauty 1 & Health & Beauty 1 </a>
<a>Health & Beauty 1 & Some other words & Beauty 1 & Some other words 2</a>

';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Emma
  • 27,428
  • 11
  • 44
  • 69
  • Thanks for your Help! I really appreciate it.. Is it possible that the it doesn't extract anything that contains any number like January 2019? And it could have only 3 spaces max. – Sabahat Ali Sep 06 '19 at 17:27