0

Can someone explain how to define user agent part so i can avoid "403 forbidden error" that i receive when i try to read tabular data from coincheckup.com web page ?

This is the test code:

import pandas as pd
tables = pd.read_html("https://coincheckup.com/")
print(tables[0])

Additional questions: 1.) How can i read specific data from one other site? Can i use pandas lib for that too ?. The site in question is samcrypto.com and I would like to read out BTC and ETH value.

Best regards !

Emil
  • 25
  • 1
  • 6

1 Answers1

0

Pandas does not allow you to change the user agent.

Your best bet is to use urllib2 or another library (perhaps requests) that allows you to change the user agent, then pass the data to pandas.

Here's an example using requests:

import requests

headers = {'User-agent': 'Custom User Agent'}

response = requests.get('https://coincheckup.com', headers=headers)

pd.read_html(response.text)
Tim Shaffer
  • 166
  • 2
  • 7
  • Thank you for your answer. It worked but i get message that there is no table. When am looking at web page source code it looks am going to have a problem to read this data , or i dont understand how this scraping works :-). The content in the table is dynamic so i dont know how to read it. Maybe am on the wrong track... – Emil Mar 16 '18 at 19:51
  • You're on the right track. requests does not execute any JavaScript, which is the problem. For that, you will need to use a library that can scrape pages with JavaScript. selenium is a popular solution for this. See this question: https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python – Tim Shaffer Mar 16 '18 at 20:01
  • i installed selenium but according to sample source code, i need to install web driver. I dont have any OS on raspberry ( running all in terminal because of 4GB sd card ) so this may not work. I will try to install lightdm, maybe this would work... – Emil Mar 16 '18 at 21:14