2

To wrap up the issue I found and need help on,

  • I created a python program that calls a get request from https://bx.in.th/api/pairing/
  • The program works well on my machine (Mac OSX)
  • Once running on a Digital Ocean Ubuntu droplet, it throws HTTP 403 forbidden error.
  • I did a day of research and most of the answers are to modify headers which I tried them all with no light of success.

Some links/references I went through.

Here is the simplified source code that points to the problem :

import urllib.request
import json

url = 'https://bx.in.th/api/pairing/'

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
    'Accept-Encoding': 'none',
    'Accept-Language': 'en-US,en;q=0.5',
    'Connection': 'keep-alive'
}

request = urllib.request.Request(url, headers=headers)

response = urllib.request.urlopen(request)

print(response.read())
print()
print(response.getheaders())

The proper output should be :

b'{"1":{"pairing_id":1,"primary_currency":"THB","secondary_currency":"BTC"},"21":{"pairing_id":21,"primary_currency":"THB","secondary_currency":"ETH"},"22":{"pairing_id":22,"primary_currency":"THB","secondary_currency":"DAS"},"23":{"pairing_id":23,"primary_currency":"THB","secondary_currency":"REP"},"20":{"pairing_id":20,"primary_currency":"BTC","secondary_currency":"ETH"},"4":{"pairing_id":4,"primary_currency":"BTC","secondary_currency":"DOG"},"6":{"pairing_id":6,"primary_currency":"BTC","secondary_currency":"FTC"},"24":{"pairing_id":24,"primary_currency":"THB","secondary_currency":"GNO"},"13":{"pairing_id":13,"primary_currency":"BTC","secondary_currency":"HYP"},"2":{"pairing_id":2,"primary_currency":"BTC","secondary_currency":"LTC"},"3":{"pairing_id":3,"primary_currency":"BTC","secondary_currency":"NMC"},"26":{"pairing_id":26,"primary_currency":"THB","secondary_currency":"OMG"},"14":{"pairing_id":14,"primary_currency":"BTC","secondary_currency":"PND"},"5":{"pairing_id":5,"primary_currency":"BTC","secondary_currency":"PPC"},"19":{"pairing_id":19,"primary_currency":"BTC","secondary_currency":"QRK"},"15":{"pairing_id":15,"primary_currency":"BTC","secondary_currency":"XCN"},"7":{"pairing_id":7,"primary_currency":"BTC","secondary_currency":"XPM"},"17":{"pairing_id":17,"primary_currency":"BTC","secondary_currency":"XPY"},"25":{"pairing_id":25,"primary_currency":"THB","secondary_currency":"XRP"},"8":{"pairing_id":8,"primary_currency":"BTC","secondary_currency":"ZEC"}}'

[('Date', 'Sun, 13 Aug 2017 09:27:02 GMT'), ('Content-Type', 'text/javascript'), ('Content-Length', '1485'), ('Connection', 'close'), ('Set-Cookie', '__cfduid=d51c37ea835bae4a0c892e91f34f7bc131502616422; expires=Mon, 13-Aug-18 09:27:02 GMT; path=/; domain=.bx.in.th; HttpOnly'), ('Cache-Control', 'max-age=86400'), ('Expires', 'Mon, 14 Aug 2017 09:27:02 GMT'), ('Strict-Transport-Security', 'max-age=0'), ('X-Content-Type-Options', 'nosniff'), ('Server', 'cloudflare-nginx'), ('CF-RAY', '38daa2e36e0a836b-BKK')]

The error got from running the source code on the droplet :

raceback (most recent call last):
  File "api-call.py", line 17, in <module>
    response = urllib.request.urlopen(request)
  File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 472, in open
    response = meth(req, response)
  File "/usr/lib/python3.5/urllib/request.py", line 582, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.5/urllib/request.py", line 510, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.5/urllib/request.py", line 590, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Thank you!

  • 1
    It may be blocking the digitalocean's ip – t.m.adam Aug 13 '17 at 10:00
  • 1
    Is there a body attached to the response? Catch the exception, you can use `.read()` on it to get the response body. It may be the server included more detail as to why it responded with a 403. – Martijn Pieters Aug 13 '17 at 10:04
  • @MartijnPieters how would you do that? – t.m.adam Aug 13 '17 at 10:11
  • Run `curl https://bx.in.th/api/pairing/` on your server to see if they are blocking your IP range. – Himal Aug 13 '17 at 10:13
  • @t.m.adam: handling exceptions you mean? https://docs.python.org/3/tutorial/errors.html#handling-exceptions – Martijn Pieters Aug 13 '17 at 10:13
  • @MartijnPieters how would you read the response? I'm genuinely curious. – t.m.adam Aug 13 '17 at 10:14
  • @t.m.adam: I already told you; catch the exception; the exception object has a `.read()` method. – Martijn Pieters Aug 13 '17 at 10:18
  • @MartijnPieters thanks for clarifying. I thought you meant `read()` the response. – t.m.adam Aug 13 '17 at 10:22
  • @t.m.adam: the exception object represents the response. – Martijn Pieters Aug 13 '17 at 10:24
  • @MartijnPieters yes, got it now – t.m.adam Aug 13 '17 at 10:25
  • @Himal I tried running the command and then I got HTML page(in text) without error. – Tawan Thampipattanakul Aug 13 '17 at 10:28
  • @TawanThampipattanakul You meant a `text/javascript` response ? You could use `curl -I https://bx.in.th/api/pairing/` to get the headers only so you can see the HTTP status code. – Himal Aug 13 '17 at 12:37
  • I followed your suggestion @MartijnPieters and retrieved HTML bytes from exceptionObject.read(). then I got following page : Please enable cookies. One more step Please complete the security check to access bx.in.th Why do I have to complete a CAPTCHA? Completing the CAPTCHA proves you are a human and gives you temporary access to the web property. ... Cloudflare Ray ID: 38db27a5582c6fcc • Your IP: 128.1XX.XX.XX • Performance & security by Cloudflare – Tawan Thampipattanakul Aug 13 '17 at 14:03
  • @TawanThampipattanakul: so they are using cloudflare to protect their site and digital ocean's IP address is subject to extra checks. – Martijn Pieters Aug 13 '17 at 14:16
  • @MartijnPieters so... is this considered as impossible? – Tawan Thampipattanakul Aug 13 '17 at 14:43
  • @TawanThampipattanakul: Cloudflare certainly tries to make it impossible; perhaps you need to respect the restrictions that the site admins have put in place? – Martijn Pieters Aug 13 '17 at 15:37
  • @MartijnPieters I know the reason. bx.in.th is currently attacked by DDOS frequently, so I believe they setup a restriction against DO droplets' IP since it is low cost and efficient for the attack. – Tawan Thampipattanakul Aug 13 '17 at 15:52
  • Thank you so much for all of your help :) To summarize, the API i want to access uses CloudFlare to block DO IPs for security reasons so I decided to use another cloud provider instead. – Tawan Thampipattanakul Aug 13 '17 at 15:57

2 Answers2

1

You have to use strong proxy like Luminati. I also was getting 403 error status, but it works well with luminati proxy.

jis0324
  • 205
  • 2
  • 15
  • My apology, I somehow missed this answer. Just want to confirm that when I ran into this same problem few years after I opened this question, a friend suggested and I tried luminati proxy which worked pretty well for me. – Tawan Thampipattanakul Jan 16 '22 at 20:10
  • k, sounds good. hope you are going well. thanks. – jis0324 Jan 17 '22 at 18:12
0

Had a similar problem on Digital Ocean

Solution is to sign up for a proxy and use it. Note: luminiti is now brightdata.com

Example from them below.

I suggest using Python's requests module and then setting your call like this:

import requests

proxies = {'http': 'http://brd-customer-hl_234567a0-zone-isp:0123456789ab@zproxy.lum-superproxy.io:22225',
           'https': 'http://brd-customer-hl_234567a0-zone-isp:0123456789ab@zproxy.lum-superproxy.io:22225'}
url = 'https://bx.in.th/api/pairing/'
headers = {'User-Agent': 'Mozilla/5.0 etc'}
r = requests.get(url, headers=headers, proxies=proxies, timeout=10)

r.status_code # should be 200, not 403

Use r.text or r.json() to read the api data from the request object.

Actually, you only need the https proxy for this example but it's good practice to include them both.