0

I'm working on a scraper that pulls down files from a website and then parses them out for the end goal. The parser keeps failing when it reaches a file of 0 bytes (as it should). Is there a way to avoid saving 0B size files when they are extracted?

I don't have a code example, but what I'm doing is creating a temp folder with os.mkdir and storing them there until they are parsed. I'm pulling them with xml.etreeElementTree. Some psuedocode:

#pretend parse function is here
os.mkdir(r'C:\TEMPFILES_TO_PARSE')

for entry in filepath:
    wb = xlrd.open_workbook(entry)
    #begin parse function(s)

tl;dr would like to not save files of 0B to avoid error flags.

nos codemos
  • 569
  • 1
  • 5
  • 19
  • 2
    https://stackoverflow.com/questions/2104080/how-to-check-file-size-in-python might be useful. For me I simply do a check if the file len is 0 and don't run it. You could also simply have a try: except: clause to skip 0byte files. – Jason Chia Feb 12 '20 at 14:36

1 Answers1

1

If your script fails when you reach a file of 0B, then you can make an if condition where you check the file size:

import os
file_size = os.path.getsize('yourfile.txt')

if file_size != 0:
    # do something here
J.K.
  • 555
  • 4
  • 8