0

I'm downloading about 800 files of trading data using the requests library in Python. The file names of interest have a pattern: "icecleared_power_YYYY_mm_dd.dat" -- but some are empty or don't exist (but are downloaded anyway). My question is: how can I ignore those files that are / would be below a certain size?

My current code downloads all files and at the end deletes those that surely have no content:

    for d in dates:
        file_name: str = 'icecleared_power_' + str(d.date()).replace('-', '_') + '.dat'
        url: str = 'https://downloads.theice.com/Settlement_Reports_CSV/Power/' + file_name
        resp = requests.get(url, auth=('username', 'password'))
    
        temp = open('Data/Futures/ICE/' + file_name, 'wb')
        temp.write(resp.content)
        temp.close()

    [os.remove(x) for x in glob(path + '*.dat') if os.path.getsize(x) < 100 * 1024]
CasusBelli
  • 153
  • 10
  • 3
    Couldn't you simply check `len(resp.content)` to determine whether it's above your threshold before writing? Also worth noting - you should _not_ be using a list comprehension for side-effects. You should use a `for` loop for that use case. Ref: https://stackoverflow.com/questions/5753597/is-it-pythonic-to-use-list-comprehensions-for-just-side-effects – g.d.d.c Aug 14 '20 at 19:27
  • `.memory_usage()` may also be helpful: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.memory_usage.html – jsmart Aug 14 '20 at 19:46
  • @jsmart: wouldn't this require a read_csv command to read the DataFrame? Or is the DataFrame automatically inferred / loaded once the file is downloaded? – CasusBelli Aug 14 '20 at 22:56

0 Answers0