2

I'm trying to add a dynamic web scraping function to my website that gather data from another website automatically. Both websites have the same URL structure, and I use my website to generate the correct target url with a js script.

<script type="text/JavaScript">
  document.getElementById("demo").innerHTML = "https://www.website2.com" + window.location.pathname;
    </script>

Website 1. www.website1.com/test-123

Website 2. www.website2.com/test-123


I found the Simple HTML DOM Parser which allow me to go into a specific website and get HTLM elements.
However, it require a target URL. Is it possible to use the results from the script as a direct url?

Example: $html = file_get_html("#demo");?>

The code look like this:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>


    <?php include("simple_html_dom.php");
    $html = file_get_html("www.website2.com/test-123");?>


</head>
<body>
<h1>Företag</h1>
<?php echo $html->find("h1",0)->plaintext;?>

<h5><?php echo $html->find("h1",0)->plaintext;?></h5>

<?php
echo $html->find("h1",0)->plaintext;
echo $html->find("p",0)->plaintext;
echo $html->find("p",1)->plaintext;
echo $html->find("p",2)->plaintext;
?>


<?php
    echo "<div id='demo'></div>";
?>




</body>
<script type="text/JavaScript">
  
  document.getElementById("demo").innerHTML = "https://www.bolagsfakta.se" + window.location.pathname;
    </script>
</html>
Werty
  • 21
  • 1
  • 2
  • Use an existing headless browser. This stuff is very complicated to get right, largely because web pages are almost infinitely complex. Don't re-invent the wheel (especially one which realistically is likely to have half the spokes missing). That's just my advice anyway. If you're only intending to target a specific site with a known structure then it might be simpler, of course – ADyson Oct 07 '21 at 17:39
  • Anyway you could generate the URL structure with PHP just as easily as with JavaScript – ADyson Oct 07 '21 at 17:41
  • The target website have the same structure on all of their pages. Without an API, is there any easier way to display the info on my website? How would the this string look like if i added the PHP code? – Werty Oct 08 '21 at 04:24
  • Well you can get the path of the request easily in PHP, see https://stackoverflow.com/a/16198831/5947043 . Then you can append that to the URL. No need for JavaScript. – ADyson Oct 08 '21 at 07:56

0 Answers0