-2

I'm using the Flickr public API to build a small Express and React app. It returns a 'description' which is a string of HTML:

<p>
    <a href="URL">martin_king.photo</a> posted a photo:
</p> 
<p>
    <a href="URL_IMAGE" title="Potato Planting 2018 | FENDT // REEKIE">
        <img src="URL_IMAGE" width="240" height="160" alt="DESC_PHOTO" />
    </a>
</p>
<p> 
    FENDT 412 Vario TMS with the REEKIE ScanStone Separator <br/><br/> Follow Us: <br/> • 
    <a href="URL" rel="nofollow">Facebook - DYNASTYphotography</a> <br/> • 
    <a href="URL" rel="nofollow">Lukaskralphoto Instagram</a><br/> • 
    <a href="URL" rel="nofollow">my personal Instagram</a><br/> • 
    <a href="URL" rel="nofollow">YouTube Channel</a>
</p>

The first a tag there contains the link I need, I just need some help writing a function/RegExp that can help me extract it. This is my intended result:

getFirstLinkFromDescription = string => ... // 1st 'URL'

Thanks for any help!

R. García
  • 815
  • 9
  • 20
  • 1
    Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Zenoo Apr 19 '18 at 08:46

3 Answers3

1

When it comes to parsing HTML, regex is not that precise. What you could do is to create Nodes from that HTML which will make it easier to access the different attributes of those elements, something like:

var html = '<p><a href="https://www.flickr.com/people/129661619@N08/">martin_king.photo</a> posted a photo:</p> <p><a href="https://www.flickr.com/photos/129661619@N08/41552236041/" title="Potato Planting 2018 | FENDT // REEKIE"><img src="https://farm1.staticflickr.com/899/41552236041_7183e7a533_m.jpg" width="240" height="160" alt="Potato Planting 2018 | FENDT // REEKIE" /></a></p> <p>FENDT 412 Vario TMS with the REEKIE ScanStone Separator<br /> <br /> You can follow us: <br /> • <a href="https://www.facebook.com/DYNASTYphotography-204551272942391/timeline/" rel="nofollow">Facebook - DYNASTYphotography</a> <br /> • <a href="https://www.instagram.com/lukaskralphoto/" rel="nofollow">Lukaskralphoto Instagram</a><br /> • <a href="https://www.instagram.com/martin939king/" rel="nofollow">my personal Instagram</a><br /> • <a href="https://www.youtube.com/user/HighlandBunnyLuv" rel="nofollow">YouTube Channel</a></p>';
var div = document.createElement("div");
div.innerHTML = html;
var anchor = div.querySelector("a");
console.log(anchor.href);
Titus
  • 22,031
  • 1
  • 23
  • 33
1

The function getUrl will return the first a tag's href or an empty string if the string passed to it doesn't have any a tags.

let htmlString = '<div><a href="www.google.com"/></div>';

function getUrl(string){
  let div=document.createElement('div');
  div.innerHTML=string;
  let firstLink=div.querySelector('a');
  if(firstLink) return firstLink.getAttribute('href');
  return '';
}
console.log(getUrl(htmlString));

console.log(getUrl('<p><a href="https://www.flickr.com/people/129661619@N08/">martin_king.photo</a> posted a photo:</p> <p><a href="https://www.flickr.com/photos/129661619@N08/41552236041/" title="Potato Planting 2018 | FENDT // REEKIE"><img src="https://farm1.staticflickr.com/899/41552236041_7183e7a533_m.jpg" width="240" height="160" alt="Potato Planting 2018 | FENDT // REEKIE" /></a></p> <p>FENDT 412 Vario TMS with the REEKIE ScanStone Separator<br /> <br /> You can follow us: <br /> • <a href="https://www.facebook.com/DYNASTYphotography-204551272942391/timeline/" rel="nofollow">Facebook - DYNASTYphotography</a> <br /> • <a href="https://www.instagram.com/lukaskralphoto/" rel="nofollow">Lukaskralphoto Instagram</a><br /> • <a href="https://www.instagram.com/martin939king/" rel="nofollow">my personal Instagram</a><br /> • <a href="https://www.youtube.com/user/HighlandBunnyLuv" rel="nofollow">YouTube Channel</a></p>'));
Luca Kiebel
  • 9,790
  • 7
  • 29
  • 44
LellisMoon
  • 4,810
  • 2
  • 12
  • 24
1

You can do it with regex, but I don't think you need regex for that:

  var someString = '<p><a href="https://www.flickr.com/people/129661619@N08/">martin_king.photo</a> posted a photo:</p> <p><a href="https://www.flickr.com/photos/129661619@N08/41552236041/" title="Potato Planting 2018 | FENDT // REEKIE"><img src="https://farm1.staticflickr.com/899/41552236041_7183e7a533_m.jpg" width="240" height="160" alt="Potato Planting 2018 | FENDT // REEKIE" /></a></p> <p>FENDT 412 Vario TMS with the REEKIE ScanStone Separator<br /> <br /> You can follow us: <br /> • <a href="https://www.facebook.com/DYNASTYphotography-204551272942391/timeline/" rel="nofollow">Facebook - DYNASTYphotography</a> <br /> • <a href="https://www.instagram.com/lukaskralphoto/" rel="nofollow">Lukaskralphoto Instagram</a><br /> • <a href="https://www.instagram.com/martin939king/" rel="nofollow">my personal Instagram</a><br /> • <a href="https://www.youtube.com/user/HighlandBunnyLuv" rel="nofollow">YouTube Channel</a></p>',
        parser = new DOMParser,
        link = parser.parseFromString(someString,"text/html").getElementsByTagName("a")[0].href;
    
    console.log(link);//"https://www.flickr.com/people/129661619@N08/"
Luca Kiebel
  • 9,790
  • 7
  • 29
  • 44
ibrahim tanyalcin
  • 5,643
  • 3
  • 16
  • 22