5

Can anyone get a proper response from the site www.coupang.com? I keep making requests to "https://www.coupang.com/" and I get an error 9 out of 10 times.(Sometimes it works! Surprisingly.)

Traceback (most recent call last):
        14: from lib/add_sup/test.rb:7:in `<main>'
        13: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http.rb:485:in `get_response'
        12: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http.rb:609:in `start'
        11: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http.rb:910:in `start'
        10: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http.rb:487:in `block in get_response'
         9: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http.rb:1365:in `request_get'
         8: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http.rb:1464:in `request'
         7: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http.rb:1491:in `transport_request'
         6: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http.rb:1491:in `catch'
         5: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http.rb:1494:in `block in transport_request'
         4: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http/response.rb:29:in `read_new'
         3: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/http/response.rb:40:in `read_status_line'
         2: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/protocol.rb:167:in `readline'
         1: from /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/protocol.rb:157:in `readuntil'
/Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/net/protocol.rb:181:in `rbuf_fill': Net::ReadTimeout (Net::ReadTimeout)

I have also tried making request using python3 and it works fine. I think there is something wrong with my ruby configuration or ruby itself.

require 'nokogiri'
require 'open-uri'
require 'net/http'


uri = URI("https://www.coupang.com/")
res = Net::HTTP.get_response(uri)
puts res.body if res.is_a?(Net::HTTPSuccess)

I would appreciate your kind thoughts on this matter. Thank you!

Jamie
  • 61
  • 4
  • Now the error reads /Users/j/.rbenv/versions/2.5.3/lib/ruby/2.5.0/openssl/buffering.rb:182:in `sysread_nonblock': Operation timed out (Errno::ETIMEDOUT) – Jamie Sep 02 '19 at 05:19
  • 1
    Probably it is taking more than a minute to respond? Try increasing the `read_timeout`(in seconds) for the call. – Surya Sep 02 '19 at 09:06
  • It really shouldn't take more than a minute because python3 takes less than 2 seconds to read. I've tried increasing read_timeout to 100 and no luck – Jamie Sep 02 '19 at 09:50

1 Answers1

2

They're using akamai, so first - they expect HTTP/2 (you'll need an http2 gem for that) and they have some fairly tight User-Agent sniffing.

Here's an example that works using net-http2

client = NetHttp2::Client.new "https://www.coupang.com/"
res = client.call :get, '/', headers: { "User-Agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6)" }
puts res.body if res.ok?
smathy
  • 26,283
  • 5
  • 48
  • 68
  • I can get a response using net-http2 gem. Thank you! But another problem arises. I cannot select certain elements using Nokogiri. It seems like the respone is leaving out some information or something. – Jamie Sep 03 '19 at 05:22
  • ```ruby url = "https://coupang.com/" client = NetHttp2::Client.new(url) res = client.call :get, '/', headers: { "User-Agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6)" } doc = Nokogiri::HTML(res.body) #returns emptry array doc.css(".recommendation-list") ``` – Jamie Sep 03 '19 at 05:31
  • I think nokogiri is having trouble parsing javascript rendered part of html. Do you have any suggestion? – Jamie Sep 03 '19 at 09:03
  • If my answer resolved *this* problem then you should accept it as the answer, and then you should create a new question showing the new problem you're talking about. – smathy Sep 04 '19 at 19:36