-1

Scenario:

User adds a URL (Like URLs below). I want to return all links on the page that appears in href attribute. I can get all links using DOMDocument class. But some links are relative. I want to convert them to absolute links.


My Problem:

Suppose I have these strings (URLs) (User may sends any link. like one of these):

http://example.com/test/index.html
http://example.com/test/test.php
http://example.com/test/
http://example.com/test

Now I get all contents of the href of user's input URL. Some of these URLs are relative so I have to merge those relative links to the base URL. My problem is that I cannot get the base URL. In this example my base URL would be http://example.com/test. So if user enters any of those URLs above I have to get the same base URL.

How can I extract the URL base correctly?

Sky
  • 4,244
  • 7
  • 54
  • 83
  • Take a look at this: [link](http://stackoverflow.com/questions/14912943/how-to-print-current-url-path) – Rouhollah Mazarei Sep 25 '16 at 08:23
  • No. `$_SERVER['HTTP_HOST']` returns current php host. I don't want this. I want to get from a string. – Sky Sep 25 '16 at 08:28
  • If `test` is your script root directory and you have not already defined it within a constant then consider to do it. – revo Sep 25 '16 at 08:44
  • 1
    Where do the strings come from? Maybe you can just check for the shortest string? – RST Sep 25 '16 at 08:50
  • @RST They come from user input. Users enter a URL and I want to get all `href` links. INCLUDING relative links. And of course I need to convert them to absolute path. – Sky Sep 25 '16 at 08:55
  • @revo Yes. `test` is my root directory. I can't define it. some sites don't define it. – Sky Sep 25 '16 at 08:57
  • can you clarify that scenario please by updating the question. You said you are getting these strings from your users. However, it sounds like these are the links on the pages you are crawling instead. Also, note that the base url for a page can defined via the [`` element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base) so just looking at all the found urls won't necessarily be correct. On another side note, there's a [dozen existing Q&A on how to convert from relative to absolute links](http://stackoverflow.com/search?q=convert+all+relative+to+absolute+links+%5Bphp%5D) – Gordon Sep 25 '16 at 09:08
  • I think you need to put more detail to your post. What do you mean by you want to get all href, are you crawling the domain? – RST Sep 25 '16 at 09:08
  • I've completely updated my question. Please read it again. Thanks. – Sky Sep 25 '16 at 09:58
  • So you mean a base URL can be different every time? – revo Sep 25 '16 at 10:56
  • @revo Depends on users's input yes. for example in `http://stackoverflow.com/questions/39684680/php-get-base-url-of-a-page`, base URL is `http://stackoverflow.com/questions/39684680/php-get-base-url-of-a-page` but in `http://example.com/index.html`, base URL is: `http://example.com` – Sky Sep 25 '16 at 11:40
  • What if `index.html` is a directory named that way? – revo Sep 25 '16 at 12:28
  • then `/` would be necessary. like `eample.com/index.html/` otherwise it's counted as a file in browser. – Sky Sep 25 '16 at 13:09
  • What if `test` in `http://example.com/test` is a file name without an extension? – revo Sep 25 '16 at 13:11

1 Answers1

-2

You can use parse_url method like this code:

<?php 
    $url = 'http://example.com/test';
    //get base url from link
    $base_url=parse_url($url, PHP_URL_HOST);
    //get path from link
    $path=parse_url($url, PHP_URL_PATH);
    //print base_url
    echo 'Base URL = '.$base_url;
    //print path
    echo ' PATH = '.$path;
 ?>