0

I have a list of URLs to download data. I am doing this on Kaggle. I want to know how to download this data, save to kaggle or local machine. The goal is, download this data onto Python and combine them into a single CSV file and download this big file. Presently each URL corresponds to one year data.

Ref: Download Returned Zip file from URL

My code:

    url_list = ['https://mapfiles.nrel.gov/data/solar/ae014839fbbe9de5c30bedf56a2f5521.zip', 'https://mapfiles.nrel.gov/data/solar/ea8f39523778ba0223a28116a3e9d85a.zip']
    
import requests, zipfile, io
data_list = []
for url in url_list: 
    r = requests.get(url)
    z = zipfile.ZipFile(io.BytesIO(r.content))
    data_list.append(pd.read_csv(z.open(z.namelist()[0])))
# Create a big dataframe
df = pd.concat(data_list)
df.to_csv('WeatherData.csv')

It is working as I intended. But, is there a better way of doing it.

Mainland
  • 4,110
  • 3
  • 25
  • 56

1 Answers1

0

First of all, i'm not pretty sure about in what aspect that you wanted to improve or any better way to do it. But based on your code, i think there are several ways to improve it

  1. Use context manager to handle zip file objects. it will automatically and properly be closed after the operation is completed
  2. Check HTTP response status code if it was failed / unable to download from the source URL. You can use if-else or try-catch blocks
import io
import requests
import zipfile
import pandas as pd

url_list = [
    "https://mapfiles.nrel.gov/data/solar/ae014839fbbe9de5c30bedf56a2f5521.zip",
    "https://mapfiles.nrel.gov/data/solar/ea8f39523778ba0223a28116a3e9d85a.zip",
]

data_list = []
for url in url_list:
    resp = requests.get(url)
    if resp.status_code != 200:
        raise Exception("Unable to download any file", url)
    with zipfile.ZipFile(io.BytesIO(resp.content)) as z:
        data_list.append(pd.read_csv(z.open(z.namelist()[0])))

df = pd.concat(data_list)
df.to_csv('WeatherData.csv')

Another else, i guess you can also use list comprehension instead of looping through data_list object and appending them one at time. Hope this answer your question