Best way to test if large file exists

Question

I would like to know what is most efficient way to test if a large file exists locally (without loading it in memory). If it doesn't exists (or not readable) then download it. The goal is to upload the data in a pandas DataFrame.

I wrote the snippet below which is working (and tested with a small file). What about correctness and pythonic programming?

url = "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv" # 4.7kB  
file = "./test_file.csv" 

try:
    os.open( file, os.O_RDONLY)
    df_data = pd.read_csv( file, index_col=0)

except: 
    df_data = pd.read_csv( url, index_col=0)
    df_data.to_csv( file)

You can pass `nrows=1` and then check the df.shape or length, so this will just read a single row — EdChum, May 15 '17 at 08:34
To check if a file exists , check this - http://stackoverflow.com/questions/82831/how-do-i-check-whether-a-file-exists-using-python , put a os.path.isfile check before downloading and reading into a df and in your except handle errors that are more related to the file having invalid characters which cause problem when loading into df. — Satyadev, May 15 '17 at 08:36
`import os.path` then `os.path.isfile(fname)` will return True if the file exists — Nuageux, May 15 '17 at 08:37
os.path.isfile( file) seems to be the best solution: to check before downloading a huge file: if not os.path.isfile( file): — alEx, May 15 '17 at 15:36

score 4 · Answer 1 · answered May 15 '17 at 08:55

I think you can use try and catch FileNotFoundError:

url = "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv" # 4.7kB  
file = "./test_file.csv" 

try:
    df_data = pd.read_csv(file, index_col=0)

except FileNotFoundError: 
    df_data = pd.read_csv(url, index_col=0)
    df_data.to_csv(file)

score 0 · Answer 2 · answered May 15 '17 at 08:37

0

You can check if the file exists, and load from an url if it does not:

import os
import pandas as pd

url = "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv"
f = "./test.csv"

if os.path.exists(f):
    df = pd.read_csv(f)
else:
    df = pd.read_csv(url)

answered May 15 '17 at 08:37

Robbie

4,672
1
19
24

A try catch would be recommended to teach the most pythonic way. – Satyadev May 15 '17 at 08:41

score 0 · Answer 3 · answered May 15 '17 at 15:49

os.path.isfile( file) seems to me the best solution: checking before downloading a huge file:

if not os.path.isfile( file):
       urllib.urlretrieve(url, file)
df_data = pd.read_csv( file, index_col=0)

It's slower than uploading it directly in memory from url(dowload to disk and then upload into memory), but safer in my situation...
Thx to all

Best way to test if large file exists

3 Answers3