Retrieving CSV file from URL

Question

I’m trying to read a csv file from a url using pyscript. (To eventual load into pandas, not shown in my example.) Following the example here, https://docs.pyscript.net/latest/guides/http-requests.html, I’m able to use pyfetch to retrieve the example payload, which appears to be json. However, I can’t seem to use it to retrieve a csv file (or any other non-json payload.)

An example is below. The first download works; the second does not.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width,initial-scale=1" />
    <link rel="stylesheet" href="https://pyscript.net/latest/pyscript.css" />
    <script defer src="https://pyscript.net/latest/pyscript.js"></script>
  </head>
  <body>
    <py-script>
        import asyncio
        import json
        
        from pyodide.http import pyfetch

        
        async def main():
            # This works
            url = "https://jsonplaceholder.typicode.com/posts/2"
            response = await pyfetch(url)
            print(f"status:{response.status}")

            # This does not work
            url = "https://people.math.sc.edu/Burkardt/datasets/csv/turtles.csv"
            response = await pyfetch(url)
            print(f"status:{response.status}")


        asyncio.ensure_future(main())
    </py-script>
  </body>
</html>

I should note, I did find a couple pyscript tutorials on csv files specifically; however, they appear to use deprecated approaches.

score 1 · Accepted Answer · answered May 01 '23 at 16:53

PyScript has a builtin feature to allow loading of remote files into the in-browser filesystem called [[fetch]] configurations. These remove the need to rely on open_url directly.

A fetch configuration can take several keys - in your case, the from key fetches the content from a given URL, and places the content in a folder in the in-browser filesystem (that Python has access to) in a file that looks like te last part of that URL's path. (In this case, turtles.csv).

You can also use the to_file key to specify a different filename, or the to_folder key to place that file within a different folder in the in-browser filesystem.

<py-config>
    packages = [
        "numpy",
        "pandas",
        "jinja2"
    ]
    [[fetch]]
    from = 'https://raw.githubusercontent.com/fomightez/pyscript_test/main/turtles.csv'
</py-config>

<py-script>
    import pandas as pd

    df = pd.read_csv('turtles.csv')
    Element("pandas-output").element.style.display = "block"
    display (df.head().style.format(precision=2), target="pandas-output-inner", append="False")
</py-script>

this is nice; thank you. Do you have any idea as to why the original url does not work, https://people.math.sc.edu/Burkardt/datasets/csv/turtles.csv? — Chris, May 03 '23 at 15:46
It looks like the server that stores that file doesn't have the appropriate headers to allow general access. When using that URL, in the browser dev console I see: `Access to fetch at 'https://people.math.sc.edu/Burkardt/datasets/csv/turtles.csv' from origin 'http://127.0.0.1:5501' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.` You can read more about this CORS issue here: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSMissingAllowOrigin — Jeff Glass, May 03 '23 at 16:13

score 0 · Answer 2 · answered Apr 21 '23 at 04:18

0

When you just click this link, the csv is getting downloaded in the local. So the file is existing. https://people.math.sc.edu/Burkardt/datasets/csv/turtles.csv

The following code worked. import pandas as pd

URL = "https://people.math.sc.edu/Burkardt/datasets/csv/turtles.csv" data=pd.read_csv(URL) print(data)

answered Apr 21 '23 at 04:18

MuthuSankaraNarayanan Valliamm

1
2

Thank you for the answer; however, my understanding (and experience) is that you can't use that approach in pyscript because pyodide does does not have the requests module. – Chris Apr 21 '23 at 13:21

Wayne · Answer 3 · 2023-04-21T20:18:13.303

Hmmm... the download from your site is tricky. I cannot see why it won't work with open_url when your site behaves the same as https://s3.eu-west-1.amazonaws.com/assets.holoviews.org/data/nyc_taxi_wide.csv when I access each in m browser. And https://s3.eu-west-1.amazonaws.com/assets.holoviews.org/data/nyc_taxi_wide.csv works with open_url. So that rules out it being any Cross Origin Resource problem (CORS).

And it is even odder since as MuthuSankaraNarayanan Valliamm points out data=pd.read_csv('https://people.math.sc.edu/Burkardt/datasets/csv/turtles.csv') works when run in a typical, full Python kernel.

And the following based on here works in a typical Python kernel, too:

# based on https://stackoverflow.com/a/35371451/8508004
import csv
import requests

CSV_URL = 'https://people.math.sc.edu/Burkardt/datasets/csv/turtles.csv'


with requests.Session() as s:
    download = s.get(CSV_URL)

    decoded_content = download.content.decode('utf-8')
    print(decoded_content)

    cr = csv.reader(decoded_content.splitlines(), delimiter=',')
    my_list = list(cr)
    for row in my_list:
        #print(row)
        pass

Maybe someone will point out how to make your source work with pyscript? For now I moved your data to GitHub.

Below is an example that links to the actual CSV data URL where I moved your data to GitHub.

Solution: use `open_url` to get CSV

Often the Pyscript demonstrations include more useful examples than the higher level guides, such as https://docs.pyscript.net/latest/guides/http-requests.html . Getting a CSV boils down to df = pd.read_csv(open_url(url)) as demonstrated

They have an example getting a CSV you can adapt

Importantly for you, there is an offered example called NYC Taxi Data Panel DeckGL Demo that includes getting data as a CSV file among those listed on the Pyscript demos page. It is found presently under the 'Visualizations & Dashboards' section there, which is way at the bottom presently:

link to 'NYC Taxi Data Panel DeckGL Demo' pyscript example running served on web: here
link to the code for 'NYC Taxi Data Panel DeckGL Demo' pyscript example on Github: here

Adapting your code to that example to focus on getting the CSV:

See the adapted version running on the web here.
The adapted code:

<!doctype html>
<html lang="en">
<head>
    <meta charset="utf-8" />
    <meta name="apple-mobile-web-app-capable" content="yes" />
    <meta name="apple-mobile-web-app-status-bar-style" content="default" />
    <meta name="theme-color" content="#0072b5" />
    <meta name="name" content="CSV Getting Demo adapted from PyScript/Panel DeckGL Demo" />

    <title>CSV Getting Demo</title>
    <link rel="icon" type="image/x-icon" href="./favicon.png" />
    <link
        rel="stylesheet"
        href="https://pyscript.net/latest/pyscript.css"
    />
    <script defer src="https://pyscript.net/latest/pyscript.js"></script>
    <link rel="stylesheet" href="https://pyscript.net/examples/assets/css/examples.css" />
</head>
<body>
    <div id="pandas-output" hidden>
        <h3>Output</h3>
        <div id="pandas-output-inner"></div>
    </div>
    <py-tutor>
            <py-config>
                packages = [
                  "numpy",
                  "pandas",
                  "jinja2"
                ]
                plugins = [
                  "https://pyscript.net/latest/plugins/python/py_tutor.py"
                ]
            </py-config>

            <py-script>
                import pandas as pd

                from pyodide.http import open_url

                url = 'https://raw.githubusercontent.com/fomightez/pyscript_test/main/turtles.csv'
                df = pd.read_csv(open_url(url))
                Element("pandas-output").element.style.display = "block"
                display (df.head().style.format(precision=2), target="pandas-output-inner", append="False")
            </py-script>
        </py-tutor>
    </section>
</body>
</html>

You can run that right here by clicking the 'Run code snippet' button. Because of the small area and the dataframe being displayed in the top of a section, I found I needed to then click on 'Full page' view option that will then come up on the right side in order to see the dataframe well. You can get back to the post by clicking on 'Close' in the upper right from the full page view.)

This is great; thank you @Wayne. The unlock for me was the pyodide open_url. — Chris, Apr 21 '23 at 17:11
Hmmm....so it works for you? Maybe it is CORS. I'm not having luck getting your link to work from code running here or on GitHub. — Wayne, Apr 21 '23 at 17:14
I should add, the source of the data seems to have created an unneeded complication. I had choose it somewhat at random for the example. — Chris, Apr 21 '23 at 17:17
Hmmm...but `https://s3.eu-west-1.amazonaws.com/assets.holoviews.org/data/nyc_taxi_wide.csv` acts the same as yours but maybe Amazon buckets allow CORS. Weird. Glad the example with `open_url` helped in the end. That's definitely the main point, to use `open_url`. — Wayne, Apr 21 '23 at 17:22

Retrieving CSV file from URL

3 Answers3

Solution: use open_url to get CSV

They have an example getting a CSV you can adapt

Solution: use `open_url` to get CSV