-3

Case 1:

Sample html : <a href="https://www.jessussaveme.com/saveme/c-from.html?random[for_god_sake_save_me]=anyonethere&no=fr&lang=fr">Test</a>

Expected output :

https://www.jessussaveme.com/saveme/c-from.html?random[for_god_sake_save_me]=anyonethere&no=fr&lang=fr

Case 2:

Sample html : <a href="https://www.jessussaveme.com/saveme/c-from.html?random[]=anyonethere&no=fr&lang=fr">Test</a>

Expected output: nothing. A link should not contain empty square brackets []

Case 3:

Sample html : <a href="https://www.jessussaveme.com/saveme/c-from.html?random=anyonethere&no=fr&lang=fr">Test</a>

Expected Output: https://www.jessussaveme.com/saveme/c-from.html?random=anyonethere&no=fr&lang=fr

Which Links should be chosen: 1. Links that do contain not contain any square brackets '[]' OR 2. Links that contain non-empty square bracket '[Some_random_text]'

Link That should not be picked: Links that contain an empty square bracket [].

Mukyuu
  • 6,436
  • 8
  • 40
  • 59
Manish Chauhan
  • 167
  • 2
  • 10

2 Answers2

0

Rather than regex, you can use jQuery to do so:

$("a").each(function(index) { // iterates all <a> elements
  console.log($(this).attr('href').includes('[]') ? '' : $(this).attr('href')); // check if contain "[]" or not.
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<a href="https://www.jessussaveme.com/saveme/c-from.html?reges[for_god_sake_save_me]=anyonethere&no=fr&lang=fr">Test</a>

<a href="https://www.jessussaveme.com/saveme/c-from.html?reges[]=anyonethere&no=fr&lang=fr">Test</a>

<a href="https://www.jessussaveme.com/saveme/c-from.html?random=anyonethere&no=fr&lang=fr">Test</a>

Unless you can fetch the text from a href you shouldn't use regex to parse.


Since you already said that you use PHP you could try the following method to extract URL:

$html = '<a href="https://www.jessussaveme.com/saveme/c-from.html?reges[for_god_sake_save_me]=anyonethere&no=fr&lang=fr">Test</a>

    <a href="https://www.jessussaveme.com/saveme/c-from.html?reges[]=anyonethere&no=fr&lang=fr">Test</a>

    <a href="https://www.jessussaveme.com/saveme/c-from.html?random=anyonethere&no=fr&lang=fr">Test</a>';

$hrefs = array();

$dom = new DOMDocument();
$dom->loadHTML($html);

$tags = $dom->getElementsByTagName('a');
foreach ($tags as $tag) {
       $hrefs[] =  $tag->getAttribute('href');
}

And check if contain empty bracket:

foreach($hrefs as $a) 
{
    if (strpos($a, '[]') == false) {
        echo 'true'; // doesn't contain empty bracket
    }
}
Mukyuu
  • 6,436
  • 8
  • 40
  • 59
0

This one works :

<\S.*?=\"(.*reges\[\w+\].*)\">.*>

You can see it working here. It just matches the first tag in group 1 and returns nothing in the second scenario when the [ ] are empty.

https://regex101.com/r/cdvVnP/1

Edit:

For the third case, it should look something on the lines of :

if( !str.contains("reges[")){
  //passed() -pick up tat link as string doesnt contain reges[] or reges [some text]
}else{
//match with <\S.*?=\"(.*reges\[\w+\].*)\">.*>
// if you find match then pickup that link from group 1
}
Amey Shirke
  • 599
  • 1
  • 5
  • 16
  • @ Amey Shirke : thanks for your reply. My mistake I did not describe case third. A link that does not contain any square brackets should be passed. plz refer case 3 that I have just updated. Plz save me from this blood bath of regex. – Manish Chauhan Nov 27 '19 at 07:32
  • for that, you can just do !str.contains("reges\[\w+\]") before you match using above regex – Amey Shirke Nov 27 '19 at 08:40
  • yes, that would work but 2 comparisons will be ineffective. *reges: is just a random word. – Manish Chauhan Nov 27 '19 at 10:50
  • updated the code.String.contains should be your keyword + [ for e.g : "reges[" – Amey Shirke Nov 27 '19 at 10:56
  • Amey Shirke : yes that will do the job but is there a way to avoid two comparison – Manish Chauhan Nov 28 '19 at 06:14
  • 1
    As others have already mentioned that using regex for xml parsing may not be the best. As far as avoiding comparisons are concerned, I feel, this is the closest you can get. – Amey Shirke Nov 28 '19 at 07:48