Is it possible to read only first N bytes from the HTTP server using Linux command?

Question

Given the url http://www.example.com, can we read the first N bytes out of the page?

using wget, we can download the whole page.
using curl, there is -r, 0-499 specifies the first 500 bytes. Seems solve the problem.

You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.
using urlib in Python. Similar question here, but according to Konstantin's comment, is that really true?

Last time I tried this technique it failed because it was actually impossible to read from the HTTP server only specified amount of data, i.e. you implicitly read all HTTP response and only then read first N bytes out of it. So at the end you ended up downloading the whole 1Gb malicious response.

So, how can we read the first N bytes from the HTTP server in practice?

score 32 · Answer 1 · edited Apr 02 '22 at 05:04

You can do it natively by the following curl command (no need to download the whole document). According to the curl man page:

RANGES HTTP 1.1 introduced byte-ranges. Using this, a client can request to get only one or more subparts of a specified document. curl supports this with the -r flag.
Get the first 100 bytes of a document:
    curl -r 0-99 http://www.get.this/

Get the last 500 bytes of a document:  
    curl -r -500 http://www.get.this/

`curl` also supports simple ranges for FTP files as well.
Then you can only specify start and stop position.

Get the first 100 bytes of a document using FTP:
    curl -r 0-99 ftp://www.get.this/README

It works for me even with a Java web app deployed to GigaSpaces.

Note that the server has to support this option – Kyle Crawford Jul 21 '17 at 23:28 — Kyle Crawford, Jul 21 '17 at 23:28

sehe · Accepted Answer · 2018-06-22T13:05:22.540

17

curl <url> | head -c 499

or

curl <url> | dd bs=1 count=499

should do

Also there are simpler utils with perhaps borader availability like

    netcat host 80 <<"HERE" | dd count=499 of=output.fragment
GET /urlpath/query?string=more&bloddy=stuff

HERE

Or

GET /urlpath/query?string=more&bloddy=stuff

edited Jun 22 '18 at 13:05

answered Apr 26 '11 at 08:32

sehe

374,641
47
450
633

2

Thanks. Using *curl* or *GET*, we can get the whole document. So with *dd* or *head*, we can cut the length. But is it possible we don't need to get the whole page? – hahakubile Apr 27 '11 at 02:03
6

Streaming. UNIX philosphy and pipes: they are data streams. Since curl and GET are unix filters, ending the receiving pipe (dd) will terminate curl or GET early (SIGPIPE). There is no telling whether the server will be smart enough to stop transmission. However on a TCP level I suppose it would stop retrying packets once there is no more response. – sehe Apr 27 '11 at 08:43
If the file is binary, you'll probably want to use dd. The dd command defaults to a blocksize of 512 bytes, so if you only want the first 499 bytes, you need to do 'dd bs=1 count=499'. Or if you only want the first 512 bytes, 'dd count=1' will do. – Adam F Dec 30 '12 at 17:41

score 2 · Answer 3 · answered Apr 26 '11 at 08:32

2

You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.

You will have to get the whole web anyways, so you can get the web with curl and pipe it to head, for example.

head

c, --bytes=[-]N print the first N bytes of each file; with the leading '-', print all but the last N bytes of each file

answered Apr 26 '11 at 08:32

Uxío

1,333
11
12

Now, I am using curl | head. Is there any commands that don't download the whole web page, just give the first N bytes? thx. – hahakubile Apr 27 '11 at 02:07

score 0 · Answer 4 · answered Dec 04 '18 at 15:52

0

I came here looking for a way to time the server's processing time, which I thought I could measure by telling curl to stop downloading after 1 byte or something.

For me, the better solution turned out to be to do a HEAD request, since this usually lets the server process the request as normal but does not return any response body:

time curl --head <URL>

answered Dec 04 '18 at 15:52

Luc

5,339
2
48
48

2

Many servers, e.g. Amazon S3, explicitly disable `HEAD` requests. – Ian Kemp Mar 19 '19 at 13:31

score -1 · Answer 5 · answered Apr 26 '11 at 07:26

-1

Make a socket connection. Read the bytes you want. Close, and you're done.

answered Apr 26 '11 at 07:26

Adam Dymitruk

124,556
26
146
141

Yes, you are right, @adymitruk. But if not socket, there is no linux command can handle these needs? – hahakubile Apr 26 '11 at 07:30

Is it possible to read only first N bytes from the HTTP server using Linux command?

5 Answers5

Linked