1

I want to import a public dataset from Kaggle (https://www.kaggle.com/unsdsn/world-happiness?select=2017.csv) into a local jupyter notebook. I don't want to use any credencials in the process.

I saw diverse solutions including: pd.read_html, pd.read_csv, pd.read_table (pd = pandas). I also found the solutions that imply a login.

The first set of solutions are the ones I am interested in, though I see that they work on other websites because there is a link to the raw data. I have been clincking everywhere in the kaggle interface but find no direct url to raw data.

Bottom line: Is it possible to use say pd.read_csv to directly get data from the website into your local notebook? If so, how?

Sapiens
  • 1,751
  • 2
  • 17
  • 19
  • Show us what you tried and explain how it failed to meet your needs. – Paul H Sep 16 '21 at 23:22
  • It is usually possible to use `import pandas as pd; df = pd.read_csv(url)` directly. – Felipe Whitaker Sep 17 '21 at 00:02
  • With that you get a table with the html headers from the page. The data is not even among in output. That works if you have the raw data page, which I can't find for kaggle datasets... I saw that command being used and working with a github url pointing directly at a dataset. – Sapiens Sep 22 '21 at 19:21
  • Does this answer your question? [Import Kaggle csv from download url to pandas DataFrame](https://stackoverflow.com/questions/43516982/import-kaggle-csv-from-download-url-to-pandas-dataframe) – Aidan Feldman Feb 27 '23 at 21:19

1 Answers1

0
import kaggle.cli
import sys
import pandas as pd
from pathlib import Path
from zipfile import ZipFile

# download data set
# https://www.kaggle.com/unsdsn/world-happiness?select=2017.csv
dataset = "unsdsn/world-happiness"
sys.argv = [sys.argv[0]] + f"datasets download {dataset}".split(" ")
kaggle.cli.main()

zfile = ZipFile(f"{dataset.split('/')[1]}.zip")

dfs = {f.filename:pd.read_csv(zfile.open(f)) for f in zfile.infolist() }

dfs["2017.csv"]
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30