0

I am trying to read in a csv file directly from a website. Below is the Python3 code:

import pandas as pd
url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
data = pd.read_csv(url)

But I got the following error:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Input In [6], in <cell line: 3>()
      1 import pandas as pd
      2 url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
----> 3 data = pd.read_csv(url)

File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

Any clue? Many thanks.

Sophia
  • 377
  • 1
  • 12

2 Answers2

1

I like to use requests with pandas.

from io import StringIO

import pandas as pd
import requests


def get_data() -> pd.DataFrame:
    url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"

    with requests.Session() as request:
        response = request.get(url)
    if response.status_code != 200:
        print(response.raise_for_status())

    return pd.read_csv(StringIO(response.text), sep=",")


print(get_data())
Jason Baker
  • 3,170
  • 2
  • 12
  • 15
1

You should specify the storage_options argument:

import pandas as pd

url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
storage_options = {'User-Agent': 'Mozilla/5.0'}
df = pd.read_csv(url, storage_options=storage_options)

Taken from: https://stackoverflow.com/a/68816828/5304366

Adrien Pacifico
  • 1,649
  • 1
  • 15
  • 33