4

I am trying to web scrape this page and the code i use is this:

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")

I get this error when i run this code:

Traceback (most recent call last):
  File "/Users/lakesh/WebScraping/Gold.py", line 46, in <module>
    page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")
  File "/Library/Python/2.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/adapters.py", line 511, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.uobgroup.com', port=443): Max retries exceeded with url: /online-rates/gold-and-silver-prices.page (Caused by SSLError(SSLError(1, u'[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:590)'),))

Tried this as well:

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page",verify=False)

This doesn't work as well. Need some guidance.

Full code:

from requests import get
import requests
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
from collections import defaultdict
import json

requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'DES-CBC3-SHA'
page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")
html = BeautifulSoup(page.content, 'html.parser')
result = defaultdict(list)
last_table = html.find_all('table')[-1]
lakshmen
  • 28,346
  • 66
  • 178
  • 276
  • You might get some guidance from https://stackoverflow.com/questions/51991891/python3-and-requests-still-getting-sslv3-alert-handshake-failure – Dave W. Smith Apr 28 '19 at 02:26
  • requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'DES-CBC3-SHA' tried this and got this error: name 'requests' is not defined – lakshmen Apr 28 '19 at 02:54
  • I'm guessing from the stacktrace that your code likely has a `from requests import get`. If that's the case, add an `import requests`. – Dave W. Smith Apr 28 '19 at 03:36
  • still can't get the data. posted my code as well. – lakshmen Apr 28 '19 at 03:54
  • The guidance I was hinting at is that the site may be doing something odd with which cipher(s) it accepts. `curl -v ...` shows that site accepting a connection with DHE-RSA-AES256-GCM-SHA384. Try that. – Dave W. Smith Apr 28 '19 at 04:34

1 Answers1

0

I added the verify=False option, and also took out the line that is setting the cypher. Once I did this, your code worked for me in Python 3...sometimes. It works once, and then seems to not work for a while. My guess is that the site is rate-limiting access, possibly based on the agent signature it sees, trying to limit bot access. I printed last_table when it worked, and here's what I got:

<table class="responsive-table-rates table table-striped table-bordered" id="nova-funds-list-table">
<tbody>
<tr>
<td style="background-color: #002265; text-align: center; color: #ffffff;">DESCRIPTION</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">CURRENCY</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">UNIT</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">BANK SELLS</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">BANK BUYS</td>
<td style="text-align: left; display: none;"> </td>
<td style="text-align: left; display: none;"> </td>
</tr>
</tbody>
</table>

I am dumping the incoming contents to a file. When it works, I get readable HTML. When it doesn't work, I get a few readable lines at the top, and then a bunch of gibberish that may be some complex Javascript. Not sure what that is. When it doesn't work, I get this:

Traceback (most recent call last): File "/Users/stevenjohnson/lab/so/ReadAFile.py", line 8, in last_table = html.find_all('table')[-1] IndexError: list index out of range

I get back a 200 status code in either case.

Here's my version of the code:

from requests import get
from bs4 import BeautifulSoup
from collections import defaultdict

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page", verify=False)
html = BeautifulSoup(page.content, 'html.parser')
result = defaultdict(list)
last_table = html.find_all('table')[-1]
print(last_table)

I'm on a Mac. Maybe you're not, and the certificate chains on your machine are different than on mine, and so you're not able to get as far as I can. I wanted you to know, however, that your code does work for me with just verify=False.

CryptoFool
  • 21,719
  • 5
  • 26
  • 44
  • Well, then I don't know what's going on. It runs fine for me on my Mac under Python 2.716 and 3.7.3. I'm running High Sierra. - I'm running bs4-0.0.1. – CryptoFool Apr 28 '19 at 04:47
  • My code is the same as yours, except that I added `verify=False` to the end of the `get` call. - It's running consistently now for me. Have you tried lately? Maybe it was something with the site. – CryptoFool Apr 28 '19 at 04:51
  • Oh wait! .. My code isn't the same. I commented out the `requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'DES-CBC3-SHA' ` line. - I'll post my code... – CryptoFool Apr 28 '19 at 04:53
  • Not it's not working again. Something is flakey with this site I think. - worked again. It's very random as to when it works and when it doesn't. - I don't think it's the number of accesses. I just ran it successfully 8 times in a row, and then it failed on the 9th attempt. – CryptoFool Apr 28 '19 at 04:57
  • Oh...BTW...I'm not getting an authentication or SSL error. I'm always getting back content. I'm just sometimes getting back an error page, or so it seems. – CryptoFool Apr 28 '19 at 04:59
  • I added the error I'm getting when it doesn't work to my answer. It's a parse error because I'm getting back a different page that doesn't have a table in it. – CryptoFool Apr 28 '19 at 05:01
  • bummer. probably something about your SSL setup. – CryptoFool Apr 28 '19 at 05:05
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/192507/discussion-between-lakesh-and-steve). – lakshmen Apr 28 '19 at 05:05