0

I have links like

<a href="#" class="social google">Google</a>
<a href="#" class="social yahoo">Yahoo</a>
<a href="#" class="social facebook">Facebook</a>

Now I want to match only anchor text using regex.
I mean it should match only Text Google in the first link.

I have tried this code.

(?<=<a href="#" class="social .+?">).+?(?=</a>)

But its not working as expected.

Can anyone give me the correct syntax?

PrivateUser
  • 4,474
  • 12
  • 61
  • 94
  • Do you want only the a elements that have class="social"? – Rui Jarimba Feb 11 '13 at 15:10
  • @Giri: Like I said in your previous question. It is not possible to match only the text inside the tag if the content and the class are arbitrary. There is simply no support. – nhahtdh Feb 11 '13 at 15:32
  • @nhahtdh Yes I do understand. But i'm looking for alternate solutions. I think this solution will work. http://stackoverflow.com/a/14814906/736037 – PrivateUser Feb 11 '13 at 15:36
  • @Giri: It is the same as the solution the other user provide in the previous question (after edit). – nhahtdh Feb 11 '13 at 15:37
  • @nhahtdh And I think its the same solution you mentioned. `Usually capturing groups would be sufficient for most replacement scenarios`. Since i'm new to regex I couldn't understand it in the first time – PrivateUser Feb 11 '13 at 15:39
  • @nhahtdh If you don't mind, can you give me the syntax to capture the group. I mean this code is not working `` – PrivateUser Feb 11 '13 at 15:45
  • @Giri: After matching it - what are you trying to do? Without this information, I cannot suggest anything. – nhahtdh Feb 11 '13 at 15:47
  • @nhahtdh I just want to replace the text. I'm using multiple cursors. – PrivateUser Feb 11 '13 at 16:04
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/24325/discussion-between-nhahtdh-and-giri) – nhahtdh Feb 11 '13 at 16:08

4 Answers4

1

Instead of using look-behind and look-ahead to exclude the parts you don't want, I suggest using a capture group to get only the part you want:

<a href="#" class="social .+?">(.+?)</a>

Conceptually, look-arounds are used for overlapping matches. It doesn't appear that you need their functionality here.

(Of course, the usual caveats apply)

Update: this is not only an issue of best-practices. A regex using look-behind will actually produce incorrect results, because it allows the look-behind portion to overlap other matches. Consider this input:

<a href="#" class="social google">Google</a>

...

<a class="bad">foo</a>

Your regex will not only match "Google"; it will also match "foo" because the .+? that is supposed to match only part of the class string can expand all the way to another link in the text.

Community
  • 1
  • 1
  • Hi when I use your code `` it still selects the whole text with tags. Can you tell me whats wrong? – PrivateUser Feb 11 '13 at 15:17
  • 1
    @Giri, you need to use the correct Boost function to get the captured subgroup instead of the whole thing. I'm not a Boost user, but it looks like they show how to do that here: http://www.boost.org/doc/libs/1_33_1/libs/regex/doc/captures.html –  Feb 11 '13 at 15:23
0

try this

  "~<a(>| .*?>)(.*?)</a>~si"

or

   "/<a(>| .*?>)(.*?)</a>/"

php sample

  $notecomments ='<a id="234" class="asf">fdgsd</a> <a>fdgsd</a>';

  $output=preg_replace_callback(array("~<a(>| .*?>)(.*?)</a>~si"),function($matches){
       print_r($matches[2]);
       return '';
   },' '.$notecomments.' ');

this give you All anchor text

and this return only class="social"

  "#<a .*?class=\".*?social.*?\".*?>(.*?)</a>#"

sample

  $notecomments ='<a id="234" class="fas social ads">fdgsd</a> <a>fdgsd</a>';

  $output=preg_replace_callback(array("#<a .*?class=\".*?social.*?\".*?>(.*?)</a>#"),function($matches){

     print_r($matches);
 return '';},' '.$notecomments.' ');
mohammad mohsenipur
  • 3,218
  • 2
  • 17
  • 22
0

You are probably getting the correct results, but because you have other matching groups (?...) you matches also contain data you don't want.

You could try using the non-matching groups (?:...) and putting the what you would like to show up in the match within a group itself (.+?)

Louis Ricci
  • 20,804
  • 5
  • 48
  • 62
  • actually the variable-length look-behind can overlap between matches, causing the regex to erroneously match other links and other elements in the page. –  Feb 11 '13 at 15:52
0

Try this regular expression:

\<a .*?\>(.*?)\<\/a\>

Edit 1 - this regex matches the anchors that have css class "social":

\<a .*?class=".*?\bsocial\b.*?\>(.*?)\<\/a\>
Rui Jarimba
  • 11,166
  • 11
  • 56
  • 86