0

I want to take all links that begin with

<a class="execute" href="

from https://bitbucket.org/alceawisteria/ostr/issues

and then display them below in the current HTML document.

Is this possible with js? (If not, how can it be done otherwise ?)

Tried to implement approaches from the "GitHub issue widget" code to no avail.

Barmar
  • 741,623
  • 53
  • 500
  • 612
Rye
  • 49
  • 5
  • Use `document.querySelectorAll("a.execute")` to select them, then loop over this and do whatever you want with them. – Barmar Jan 02 '23 at 19:48
  • The site (bitbucket) is on a different server tho. I believe I'd need the links straight from there and then embed it into the current dom's body. Can "document.query" really do all that ? – Rye Jan 02 '23 at 20:01
  • You used the `javascript` tag so I thought you were trying to select from the current page, not web-scraping. There should be something analogous in whatever language you're writing your code in. – Barmar Jan 02 '23 at 20:04
  • 1
    If you are trying to get links to all issues for a given repo, I'd recommend looking for an API instead of screen-scraping – Chris Haas Jan 02 '23 at 20:05
  • Bitbucket seems stingy with APIs in this regard. I'd be ok with php too (after all it *can* scrape cross webpages it appears – Rye Jan 02 '23 at 20:14
  • Really? Is this not what you are looking for? https://developer.atlassian.com/cloud/bitbucket/rest/api-group-issue-tracker/#api-repositories-workspace-repo-slug-issues-get? Seems to be working for that repo: https://api.bitbucket.org/2.0/repositories/alceawisteria/ostr/issues – Chris Haas Jan 02 '23 at 20:28
  • In a pinch, mayyybe. But I'd really prefer a more general approach that is not locked to an api. Parsing a part of a website and rendering it on the current one is so much more utilizable. – Rye Jan 02 '23 at 20:32
  • With 20+ years of experience, including a _lot_ of screen scraping, I'd gladly welcome an API in every scenario. Scraping is very fragile in the long run and often gets blocked since it almost always violates the ToS. But that's just me. You've got a two step process, get the HTML from the remote site and parse for links. For the former you can use curl, file_get_contents or a bunch of other things. Once you've got it, you can use a [parser](https://stackoverflow.com/a/4423796/231316) or [regex](https://stackoverflow.com/a/1732454/231316) – Chris Haas Jan 02 '23 at 20:46
  • Well then. 20 years are a lot I suppose. I would need to know how to format bitbuckets output to deliver *just* the links I outlined in the op. Remember that I'm chained to webtools / libraries. Not sure how much "curl" I can do on my static webpage (githubpages) ... – Rye Jan 02 '23 at 20:52
  • This is such implement for github btw: https://codepen.io/ryedai1/pen/rNrOwaj – Rye Jan 02 '23 at 20:54
  • Unfortunately, the specific implementation I can't really help you with. I do see that GitHub appears to support CORS explicitly which means it is easier to do in JS, but the BitBucket API doesn't appear to be sending any CORS headers. – Chris Haas Jan 02 '23 at 22:19
  • Not the solution for this question per se (directly), but I now chose an indirect approach by reading a subsection (container) of the target site into the current dom... https://stackoverflow.com/questions/74993624/loading-only-part-of-site-into-dom-via-jquery – Rye Jan 03 '23 at 14:47

1 Answers1

0

This solves the issue via PHP link scraping

<?php
require 'simple_html_dom.php';
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
$url = 'https://bitbucket.org/alceawisteria/ostr/issues/';
$html = file_get_contents($url);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a[@class="execute"]');
foreach ($nodes as $node){
    echo $link->nodeValue;
    echo "<a target='_blank' href=";
    echo "https://bitbucket.org";
    echo $node-> getAttribute('href');
    echo ">";
    echo $node-> getAttribute('href');
    echo "</a>", '<br>';
}
?>
Rye
  • 49
  • 5