The following is my code to download a webpage (writing a basic wget)
HTTP request:
port = 80
#assume ip is known
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((ip, port))
http_message = "GET /" + path + " HTTP/1.1\r\n"
http_message += "Host: " + website + "\r\n"
http_message += "Accept: text/html\r\n"
http_message += "Accept-Language: en-US,en;q=0.9\r\n"
http_message += "Accept-Encoding: gzip, deflate\r\n"
http_message += "User-Agent: Chrome/92.0.4515.131 Mozilla/5.0 (X11; Linux x86_64)\r\n"
http_message += "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\n"
http_message += "Connection: keep-alive\r\n"
http_message += "\r\n"
x = sock.send(http_message.encode())
data = sock.recv(262144).decode("utf-8", "ignore")
print(data)
HTTP response on terminal:
HTTP/1.1 200 OK
Server: nginx/1.10.3
Date: Tue, 07 Sep 2021 18:52:00 GMT
Content-Type: text/html
Last-Modified: Thu, 31 Oct 2019 09:15:26 GMT
Transfer-Encoding: chunked
Connection: keep-alive
ETag: W/"5dbaa62e-8d0e"
Content-Encoding: gzip
272b
n~}.^~ןO?^kd[%+djj \$s>t7H3o>6Osn>
D˴?vuv{&oơ;-%Z8jYvNy]ӌ7q<n6Mݣ#g7-ocO
g0M|8vc9ؒ̉mA5MkbH~zӊoE{-FS;۷!Q<iEs-Q4t
8�<O~Ֆ҉lo?`ǖ 6(~Lc7. *7ȳ#l6_#Ai?:m}';k278W9$/2~�dn?n-?/\uS>6]1..vM|:98a`0{^p@/=/)j=
r(=/zP}56HfMK36̍F{cMhI<)ؚ+68 u2
y/
Ui
5iîBENöL߷mq]t|
:
=0
g;tĹ?q} ЙJ
S\'0ZМ2ͦ,ޫOM:Ѳ90]LL('x'O1O?ߝ\3{>byx$ơsm݉.|H|G(41۩#?َ;{=
ҺwGOG'O<T٣Guy9Q
_&?q-7GZJ-urD;scKo*&V]j: 9>"vYs
-K.0"38үsl>v;Y0vv¨[U^UmQ`N[T
HL6ݵ7(~a^3"w"?#80&B^]"7U~b
-wJyXx,�aymwo
^ݚBLJ=)JMܶS{/[z3&ØW&g(w)di͍bL S:R$B2֯m[|#XBm^Ei9b|,~D2G*.
p7+(ù!u.w ?>t+}M,x!7'Ό9|0/#S/1UbA5Fui0dPO#枽,s˄7 l5$OVin֟eAӋ:YPLsӔm}۩c
^sEh6S
mӽwG=X}ΤV*-جk70FI`!jxCr"ϐ+bUo
RE[/WR1k|%j eBB(l3^H6cuP]PM-i[%h
The weird output continues....
The following output is in in gzip format which I am not able to decompress to a txt file.
Copied the weird output (except the the http response) from the terminal to output.txt.gz
Used gzip module :
import gzip
f=gzip.open('output.txt.gz','rb')
file_content=f.read()
print (file_content)
OUTPUT :
gzip.BadGzipFile: Not a gzipped file (b'27')
Cant find the exact format for gzip..
Also if i dont decode the response
data = sock.recv(262144)
i get a huge binary file which may help...Binary Response Image