2

I am trying to send a get request to an api and get a response. My code is working fine but there is a problem with body of response, I am getting numbers before and after body of the response. Here is my code:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(2)
s = ssl.create_default_context().wrap_socket(s, server_hostname='randomuser.me')
s.connect(("randomuser.me", 443))
request = "GET /api/ HTTP/1.1\r\nHost: randomuser.me\r\n\r\n"
s.sendall(request.encode())

result = ""
while True:
    try:
        data = s.recv(1024)
        result = result + data.decode()
    except: #timeout
        break
    
s.close()
print(result) 

And here is my response:

HTTP/1.1 200 OK
Date: Tue, 29 Jun 2021 16:38:41 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
x-powered-by: Express
access-control-allow-origin: *
cache-control: no-cache
etag: W/"4a0-ZHCP6s4BF3NGwp45RPw24DhbGhw"
vary: Accept-Encoding
CF-Cache-Status: DYNAMIC
cf-request-id: 0afa3c3163000038cf7c3e4000000001
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v2?s=fEMqmLbj7EmU8DoZiHR5%2B1uSK29U8clj2yhSA%2FQH%2BhpYdWJzWEGo3ua1Kgh6oHuloggqeZCDgbHTifYw%2FVYiRzgcV8HaIN%2FRHAvjwG1sg%2FH8vrZ5YODwmHwryw%3D%3D"}],"group":"cf-nel","max_age":604800}
NEL: {"report_to":"cf-nel","max_age":604800}
Server: cloudflare
CF-RAY: 6670962f0de338cf-ATH
alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400, h3=":443"; ma=86400

4a0
{"results":[{"gender":"female","name":{"title":"Miss","first":"Sophia","last":"Bourgeois"},"location":{"street":{"number":6699,"name":"Rue du Village"},"city":"Orléans","state":"Morbihan","country":"France","postcode":70073,"coordinates":{"latitude":"-2.1449","longitude":"0.6974"},"timezone":{"offset":"-3:00","description":"Brazil, Buenos Aires, Georgetown"}},"email":"sophia.bourgeois@example.com","login":{"uuid":"e145adab-a136-4ef3-b9db-c5bdc3a32a75","username":"sadrabbit387","password":"smooth","salt":"gnz57npL","md5":"81b0e38cc190d63f6ecd275491a1feea","sha1":"b91e19770f3658500d37c11a5bd848c8baafe88e","sha256":"0a8ef17533b064723be96bee5d6338ded189e3184d5b9bdfc15af8243a3b8481"},"dob":{"date":"1984-02-26T02:12:49.648Z","age":37},"registered":{"date":"2016-09-21T08:57:41.436Z","age":5},"phone":"02-62-16-43-97","cell":"06-17-01-09-53","id":{"name":"INSEE","value":"2NNaN80745407 74"},"picture":{"large":"https://randomuser.me/api/portraits/women/76.jpg","medium":"https://randomuser.me/api/portraits/med/women/76.jpg","thumbnail":"https://randomuser.me/api/portraits/thumb/women/76.jpg"},"nat":"FR"}],"info":{"seed":"ad0ba79d620ddb6d","results":1,"page":1,"version":"1.3"}}
0 

Everything is ok except that number before the body and that 0 after the body. I am missing something but what? What are they? I am new to raw http response.

I also checked postman and curl and their response are fine, there are not that kind of numbers in their response. It is not about the api as well, because I am getting '7ac5' kind of numbers in some different apis. However, some apis are working great. I dont get it and I want to learn it.

Edit:

Another example with different api;

HTTP/1.1 200 OK
Date: Tue, 29 Jun 2021 17:06:59 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
Access-Control-Allow-Origin: *
Cache-Control: max-age=100, public
CF-Cache-Status: DYNAMIC
cf-request-id: 0afa5616e5000054823eb60000000001
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Server: cloudflare
CF-RAY: 6670bf9e3c545482-IST
alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400, h3=":443"; ma=86400

bcd
{"id":"btc-bitcoin","name":"Bitcoin","symbol":"BTC","rank":1,"is_new":false,"is_active":true,"type":"coin","tags":[{"id":"segwit","name":"Segwit","coin_counter":10,"ico_counter":0},{"id":"cryptocurrency","name":"Cryptocurrency","coin_counter":782,"ico_counter":40},{"id":"proof-of-work","name":"Proof Of Work","coin_counter":425,"ico_counter":15},{"id":"payments","name":"Payments","coin_counter":185,"ico_counter":39},{"id":"sha256","name":"Sha256","coin_counter":45,"ico_counter":1},{"id":"mining","name":"Mining","coin_counter":300,"ico_counter":18},{"id":"lightning-network","name":"Lightning Network","coin_counter":7,"ico_counter":0}],"team":[{"id":"satoshi-nakamoto","name":"Satoshi Nakamoto","position":"Founder"},{"id":"wladimir-j-van-der-laan","name":"Wladimir J. van der Laan","position":"Blockchain Developer"},{"id":"jonas-schnelli","name":"Jonas Schnelli","position":"Blockchain Developer"},{"id":"marco-falke","name":"Marco Falke","position":"Blockchain Developer"}],"description":"Bitcoin is a cryptocurrency and worldwide payment system. It is the first decentralized digital currency, as the system works without a central bank or single administrator.","message":"","open_source":true,"started_at":"2009-01-03T00:00:00Z","development_status":"Working product","hardware_wallet":true,"proof_type":"Proof of Work","org_structure":"Decentralized","hash_algorithm":"SHA256","links":{"explorer":["https://blockchair.com/bitcoin","http://blockchain.com/explorer","https://blockstream.info/","https://live.blockcypher.com/btc/","https://btc.cryptoid.info/btc/"],"facebook":["https://www.facebook.com/bitcoins/"],"reddit":["https://www.reddit.com/r/bitcoin"],"source_code":["https://github.com/bitcoin/bitcoin"],"website":["https://bitcoin.org/"],"youtube":["https://www.youtube.com/watch?v=Gc2en3nHxA4\u0026"]},"links_extended":[{"url":"https://bitcoin.org/en/blog","type":"blog"},{"url":"https://blockchair.com/bitcoin","type":"explorer"},{"url":"http://blockchain.com/explorer","type":"explorer"},{"url":"https://blockstream.info/","type":"explorer"},{"url":"https://live.blockcypher.com/btc/","type":"explorer"},{"url":"https://btc.cryptoid.info/btc/","type":"explorer"},{"url":"https://www.facebook.com/bitcoins/","type":"facebook"},{"url":"https://bitcointalk.org","type":"message_board"},{"url":"https://www.reddit.com/r/bitcoin","type":"reddit","stats":{"subscribers":3145581}},{"url":"https://github.com/bitcoin/bitcoin","type":"source_code","stats":{"contributors":975,"stars":55314}},{"url":"https://twitter.com/bitcoincoreorg","type":"twitter","stats":{"followers":126039}},{"url":"https://electrum.org/#download","type":"wallet"},{"url":"https://bitcoin.org/","type":"website"},{"url":"https://www.youtube.com/watch?v=Gc2en3nHxA4\u0026","type":"youtube"}],"whitepaper":{"link":"https://static.coinpaprika.com/storage/cdn/whitepapers/215.pdf","thumbnail":"https://static.coinpaprika.com/storage/cdn/whitepapers/217.jpg"},"first_data_at":"2010-07-17T00:00:00Z","last_data_at":"2021-06-29T17:05:00Z"}
0
a0zplt
  • 23
  • 4

1 Answers1

3

I am missing something but what?

You are not taking into account that these responses are using the chunked transfer encoding format (via the Transfer-Encoding: chunked header) to send the body data in chunks instead of a single byte stream, as you are expecting. See RFC 2616 Section 3.6.1 and RFC 7230 Section 4.1 for more details on the chunked format.

What are they?

The numbers you are referring to are chunk size indicators.

  • In the 1st response shown, there is a single chunk of data whose byte size is 4a0 (0x4A0 hex, 1184 decimal), followed by a terminating chunk whose byte size is 0.

  • In the 2nd response shown, there is a single chunk of data whose byte size is bcd (0xBCD hex, 3021 decimal), followed by a terminating chunk whose byte size is 0.

The 0-length chunk ends the body data (there is no Content-Length or Connection: close header present to end the responses otherwise).

You won't be able to use a simple recv() loop to read chunked bodies. You have to detect the chunked header, and if present then read and parse each chunk individually. Read a chunk size, skip up to the following CRLF, read the specified number of bytes, skip up to the following CRLF. Repeat until a 0-length chunk is reached. Then read a set of trailing HTTP headers that may follow the chunks, overwriting any headers that you read before the body.

See the pseudo code I present in this answer and this answer.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • I get it now, thanks a lot. In some apis there are more than two hex numbers and this explains all of that. I cannot stop at where chunk sizes are but what I can is that I can read all the body and then split it by its CRLF points. It will solve the problem or I need to use another api which is not using a chunked body. Edit: Actually, I read your pseudo codes and what I dont understand is the 'recv() a line of text until CRLF' part. How can I do this? I am just reading x bytes. Can I say something like "read until that CRLF" – a0zplt Jun 29 '21 at 18:14
  • 1
    "*I cannot stop at where chunk sizes are*" - yes, you can, just adjust your reading logic accordingly, like I describe in the other answers I linked to. "*I can read all the body and then split it by its CRLF points*" - don't do it that way. Do it the way I describe. For one thing, those CRLFs are not part of the data itself. And also, there can be other metadata in each chunk that would have to be stripped out as well. And also, stripping off the trailing headers following the last chunk. It is really not hard to just adjust your reading logic to handle the chunks properly from the beginning. – Remy Lebeau Jun 29 '21 at 18:23
  • 2
    "*what I dont understand is the 'recv() a line of text until CRLF' part. How can I do this?*" - you could simply read 1 byte at a time until LF is reached. Alternatively, Python's `socket` class has a [`makefile()`](https://docs.python.org/3/library/socket.html#socket.socket.makefile) method that has a `newline` parameter. Use that to create a [file object](https://docs.python.org/3/glossary.html#term-file-object) that reads from the socket, and then you can read a line of data from the file object using its `readline()` method. – Remy Lebeau Jun 29 '21 at 18:30
  • I have literally been hunting for this post the last 48 hours, I honestly can't thank you enough. Saved my butt. – wuzz Sep 12 '21 at 18:41