0

I’m making requests to the Reddit API. First, I set a subreddit top URL:

reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')

All of these correctly get the contents:

Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')

Open3.capture2('/usr/bin/curl', '--user-agent', 'My agent', reddit_url.to_s)[0]

URI.open(reddit_url, 'User-Agent' => 'My agent').read

But then I try it with a URL for a specific post:

reddit_url = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')

And both Net::HTTP and Open3/curl fail, getting only empty strings. URI.open continues to work, as does opening the URL in a web browser.

Why doesn’t the second request work with two of the solutions? And why does it work with URI.open, when that’s supposed to be “an easy-to-use wrapper for Net::HTTP”? What does it do differently, and how to replicate it with Net::HTTP an curl?

user137369
  • 5,219
  • 5
  • 31
  • 54

1 Answers1

1

Working with your example, and focussing on Net::HTTP for simplicity, the first example doesn't work as written:

require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
# => Type Error - no implicit conversion of URI::HTTPS into String

Instead I used this as my starting point:

require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007fc3ea8e7320>
puts result.body.size
# => 167,394

With that working we can try the second URL. Interestingly, I get different results depending on whether I re-use the initial connection or make a new one:

require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
reddit_url_two = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')

http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007f931a143390>
puts result.body.size
# => 174,615

http_two = Net::HTTP.new(reddit_url_two.host, reddit_url_two.port)
http_two.use_ssl = true
result_two = http_two.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_two
# => #<Net::HTTPMovedPermanently:0x00007f931a148818>
puts result_two.body.size
# => 0

result_reusing_connection = http.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_reusing_connection
# => #<Net::HTTPOK:0x00007f931a0fb3b0>
puts result_reusing_connection.body.size
# => 141,575

So I suspect you're getting a 301 redirect sometimes and that's causing the confusion. There's another question and answer here for how to follow redirects.

Matthew
  • 1,300
  • 12
  • 30
  • The first example works on Ruby 3.0 but doesn’t seem to work on older versions. That might explain the discrepancy on that one. You’re correct that the redirect was the problem (can’t believe I forgot to try `curl` with `--location`), though weirdly it seems to point to the same URL. – user137369 Feb 18 '21 at 19:17
  • You're right, apparently I'm still on 2.7.2. I'm still not sure why reusing the first connection works... and I couldn't see a redirect happening when accessed from a browser... a few puzzling things there but glad you're on track now! – Matthew Feb 19 '21 at 06:29