4

I am trying to speed up my Meteor application by only loading enough content of a webpage to get the <head> tag of its HTML, to obtain its title, image, and description. I have a client calling a server-side method with the following code:

Meteor.call("metaGetter", url, function(err, res){...});

And on the server side, in the metaGetter method, I am using Meteor's HTTP.call:

var result = HTTP.call('GET', url, {headers: {'content-range': "bytes 0-100"}});

as written in Meteor's documentation. I am able to get the result's content, html. However, after printing the returned headers, I do not see the content-range attribute that I have tried to set.

Edit: Akshat's solution works, but only for some websites, very few in fact. Any help would be much appreciated.

forallepsilon
  • 429
  • 2
  • 16
  • why are you calling GET when you want to call HEAD? – Matt K May 26 '15 at 15:55
  • "A server must ignore a Range header field received with a request method other than GET" - as per http://stackoverflow.com/questions/18549051/http-head-request-with-range-header – forallepsilon May 26 '15 at 15:59
  • And trying this now, it does not load the content of the webpage - I am unable to access the necessary tags. – forallepsilon May 26 '15 at 16:03
  • ah, misunderstood. You want just the head tag in the HTML?? I don't think that's gonna work. the GET method returns `content` which contains all the html, i don't think you can request just a specific tag... – Matt K May 26 '15 at 16:14

2 Answers2

2

use the range header:

var result = HTTP.call('GET', url, {headers: {'range': "bytes=0-100"}});

The response should have a content-range header if the server used supports content ranges.

Of course, this needs a host that supports request ranges. I've tried the above code and it does work on http://www.microsoft.com as the url.

Its sad to say there's nothing you can do really for websites that don't support it besides requesting the entire document.

One rather weird alternative is to manually request the webpage as a socket and cut off when you get more bytes than what you need.

Tarang
  • 75,157
  • 39
  • 215
  • 276
  • I was quite unclear - I updated the question to indicate that I need to load the content of the webpage, and want its `` tag. – forallepsilon May 26 '15 at 16:48
  • @forallepsilon I've updated the answer. You need to use the `range` header and place in your byte range – Tarang May 26 '15 at 17:23
  • I had tried using `range` originally, with no success. Is it possible the server does not support content ranges? – forallepsilon May 26 '15 at 17:36
  • @forallepsilon Yes this may be the reason. What server is it? If its a plain node server (and your own) you may be able to add in middleware to support it – Tarang May 26 '15 at 17:38
  • @forallepsilon Give it a test with this: http://www.cyberciti.biz/cloud-computing/http-status-code-206-commad-line-test/ – Tarang May 26 '15 at 17:39
  • This is actually working for some websites and not others. Forbes.com, for example, sees the content-length set to 101, whereas other websites still have the entire page. – forallepsilon May 26 '15 at 17:50
  • @forallepsilon I'm a bit confused on what you mean. This seems to work fine for me. I tried a blank project on `microsoft.com` and got what was expected. Unfortunately nothing can be done for websites that don't support request ranges. – Tarang Jun 01 '15 at 10:00
1

In general, you can't have fixed limit if you want always fetch the title:

  1. Some HTTP servers doesn't support range header: How can I find out whether a server supports the Range header?
  2. You can't guarantee that X bytes will always contain title. E.g. it may appear after 1000 bytes.

In general I would fetch whole HTML file. On most decent servers, that should take less than 100 ms. Hardly noticeable by human. If you do that a lot, you may want to allow executing server side method in parallel (see http://docs.meteor.com/#/full/method_unblock)

If optimization is must, you can use previous method, fetch 100 bytes, but if you don't find </title> than you fall back to downloading whole HTML file.

Community
  • 1
  • 1
Jakozaur
  • 1,957
  • 3
  • 18
  • 20