1

I want to download the content of webpage . When i am making get request to example.com , I am able make the connection .

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    //Stream sockets and rcv()
    
    struct addrinfo hints, *res;
    int sockfd;
    
    char buf[2056];
    int byte_count;
    
    //get host info, make socket and connect it
    memset(&hints, 0,sizeof hints);
    hints.ai_family=AF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    getaddrinfo("example.com","80", &hints, &res);
    sockfd = socket(res->ai_family,res->ai_socktype,res->ai_protocol);
    printf("Connecting...\n");
    connect(sockfd,res->ai_addr,res->ai_addrlen);
    printf("Connected!\n");
    char *header = "GET /index.html HTTP/1.1\r\nHost: example.com\r\n\r\n";
    send(sockfd,header,strlen(header),0);
    printf("GET Sent...\n");
    //all right ! now that we're connected, we can receive some data!
    byte_count = recv(sockfd,buf,sizeof(buf)-1,0); // <-- -1 to leave room for a null terminator
    buf[byte_count] = 0; // <-- add the null terminator
    printf("recv()'d %d bytes of data in buf\n",byte_count);
    printf("%s",buf);
    return 0;
}

But if in place of www.example.com if i use http://info.cern.ch/ or http://galileoandeinstein.physics.virginia.edu/lectures/newton.pdf (to download the pdf) i get segmentation fault (with both port numbers 80 and 443) .

Code that doesn't work :

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    //Stream sockets and rcv()
    
    struct addrinfo hints, *res;
    int sockfd;
    
    char buf[2056];
    int byte_count;
    
    //get host info, make socket and connect it
    memset(&hints, 0,sizeof hints);
    hints.ai_family=AF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    getaddrinfo("http://galileoandeinstein.physics.virginia.edu/lectures/newton.pdf","80", &hints, &res);
    sockfd = socket(res->ai_family,res->ai_socktype,res->ai_protocol);
    printf("Connecting...\n");
    connect(sockfd,res->ai_addr,res->ai_addrlen);
    printf("Connected!\n");
    char *header = "GET /index.html HTTP/1.1\r\nHost: http://galileoandeinstein.physics.virginia.edu/lectures/newton.pdf\r\n\r\n";
    send(sockfd,header,strlen(header),0);
    printf("GET Sent...\n");
    //all right ! now that we're connected, we can receive some data!
    byte_count = recv(sockfd,buf,sizeof(buf)-1,0); // <-- -1 to leave room for a null terminator
    buf[byte_count] = 0; // <-- add the null terminator
    printf("recv()'d %d bytes of data in buf\n",byte_count);
    printf("%s",buf);
    return 0;
}
  • 4
    Use a library like libcurl instead of trying to roll your own HTTP client. – Shawn Apr 05 '21 at 09:25
  • 1
    Please show the exact code that **doesn't** work. – prog-fh Apr 05 '21 at 09:40
  • 1
    You MUST correctly and completely handle the results returned from system calls like socket(), connect(), send() and recv(). – Martin James Apr 05 '21 at 09:52
  • 1
    @Shawn I am trying to do without libcurl – Daniel Wayne Apr 05 '21 at 10:07
  • 1
    @prog-fh I have added the code that does not work – Daniel Wayne Apr 05 '21 at 10:09
  • @MartinJames when i am doing sockfd = socket(res->ai_family,res->ai_socktype,res->ai_protocol); then segmentation fault is happening . So i infer from it that previous line getaddrinfo("http://galileoandeinstein.physics.virginia.edu/lectures/newton.pdf","80", &hints, &res); is not able to work . I don't know how to correct it . It's easily happening with libcurl but i want to do without it – Daniel Wayne Apr 05 '21 at 10:11

1 Answers1

2

getaddrinfo() and Host: (in the header) should specify the hostname (not the full URI).

In your example this is galileoandeinstein.physics.virginia.edu.

Because you don't check the result of getaddrinfo(), you don't detect that the res pointer is not correctly initialised in case of failure. Then, using the members of the pointed-to struct produces a segmentation-violation.

The request-header should be something like

"GET /lectures/newton.pdf HTTP/1.1\r\n"
"Host: galileoandeinstein.physics.virginia.edu\r\n"
"Connection: close\r\n"
"\r\n"

The Connection: close is not mandatory but will ease your simple experiment.

In order to experiment with HTTPS, this example could be a good starting point.

prog-fh
  • 13,492
  • 1
  • 15
  • 30
  • Thanks . After correctly writing domain name and request-header it's working for most of the http websites . However whenever i am trying to download https website then it's giving not able to do it . Let's say i am trying to download https://en.wikipedia.org/wiki/Main_Page then it's not showing error but not able to download it . I have attached the code link https://pastebin.com/ZXUPzDYA . It's same as in question but just i am using https website. – Daniel Wayne Apr 05 '21 at 11:32
  • @DanielWayne HTTPS is HTTP via SSL, so you need some SSL utilities that perform encryption/decryption over the TCP connection. – prog-fh Apr 05 '21 at 12:37
  • Could you provide some links/resources for that . One of good resources i found : https://stackoverflow.com/questions/62011930/c-to-perform-https-requests-with-openssl – Daniel Wayne Apr 05 '21 at 13:34
  • Where i can master these stuffs ? Any specific book or link which explains all these things using examples in C language ? – Daniel Wayne Apr 05 '21 at 17:18
  • @DanielWayne [This site](https://wiki.openssl.org/index.php/Main_Page) contains documentation and examples. – prog-fh Apr 05 '21 at 17:23