1

I am using the PHP lib Simple HTML Dom Parser, as suggested here ( How do you parse and process HTML/XML in PHP? ) to parse a webpage's html content.

To create the DOM, I have to do:

$html = file_get_html('http://www.example.com/');

The problem is that if I do:

$html = file_get_html('www.example.com');

without specifying the URL's protocol, I will get an error.

My question is: How can I get to know if the URL with the protocol is "http://www.example.com/" or "https://www.example.com/" having in hands only the string "www.example.com"?

Community
  • 1
  • 1
AntonioJunior
  • 919
  • 2
  • 15
  • 32
  • Well, you can't. Domain names are quite independent from the protocol used - might as well be `ftp://`, or something even more exotic. (as for the error: it's trying to open a local file named `www.example.com` - you probably don't have that on your disk :)) – Piskvor left the building Aug 26 '11 at 01:08

3 Answers3

2

There is no way to know because both could be valid. I would assume http:// though because normal practice is to redirect http to https if it is required, and file_get_html should follow an HTTP 301 or 302 redirect.

Rajiv Makhijani
  • 3,631
  • 32
  • 31
2

I can't figure out something smarter than assuming "http://" as default and, if it fails, try "https://"

if (!$html = file_get_html('http://' . $url)) $html = file_get_html('https://' . $url);
etuardu
  • 5,066
  • 3
  • 46
  • 58
1

You could try to use get_headers() on the http address and look for the Upgrade: request in the header. If you get a valid response, use http. Otherwise, try on https.

IslandCow
  • 3,412
  • 3
  • 19
  • 24