I'm trying to get to scrap all links from yahoo.com and get the size of the page itself. If I set User-Agent = "Mozilla/5.0"
to my HTTP request, I would be able to scrap all links but my content-length would be 0.
let client = reqwest::blocking::Client::new();
let response = client.get(link)
.header("User-Agent", "Mozilla/5.0");
match response.send() {
Ok(rep) =>{
Some((res.content_length().unwrap(), rep.text().unwrap()))
},
Err(_e) =>{
None
}
}
Here's the result from the terminal:
[list of links from www.yahoo.com scraped will be here but I exclude them for visibility] url:https://www.yahoo.com/; size:0
The terminal will show that I just scraped https://www.yahoo.com with content-length received to be 0.
However, in the same code, if I removed the line .header("User-Agent", "Mozilla/5.0");
. I will be able to receive the content-length and it will be something like 183174, but I won't be able to scrap any links from yahoo.com.
If I cheated by using len() on the HTML text I received, I will have like 600000 in size.