1

We prepare a following python scripts (python 2.7) to make histograms.

histogram.py

#!/usr/bin/env python
import sys
import numpy as np
import matplotlib as mpl
import matplotlib.mlab as mlab
mpl.use('Agg')
import matplotlib.pyplot as plt

sys.argv[1]  # Define input name
sys.argv[2]  # Define output name
sys.argv[3]  # Define title

# Open the file name called "input_file"
input_file=sys.argv[1]
inp = open (input_file,"r")
lines = inp.readlines()
if len(lines) >= 20:
    x = []
    #numpoints = []
    for line in lines:
#  if int(line) > -10000:  # Activate this line if you would like to filter any date (filter out values smaller than -10000 here)
            x.append(float(line))
# the histogram of the data
            n, bins, patches = plt.hist(x, 50, normed=False, facecolor='gray')
            plt.xlabel('Differences')
            numpoints = len(lines)
            plt.ylabel('Frequency ( n =' + str(numpoints) + ' ) ' )
            title=sys.argv[3]
            plt.title(title)
            plt.grid(True)
            save_file=sys.argv[2]
            plt.savefig(save_file+".png")
            plt.clf()
inp.close()

example: input

1
2
3

The script will do the following

python histogram.py input ${output_file_name}.png ${title_name}

We add a line "if len(lines) >= 20:" so if the data points are less than 20, we don't make a plot.

However, if the file is empty, this python script will be freeze.

We add a bash line to remove any empty files before running "python histogram.py input ${output_file_name}.png ${title_name}"

find . -size 0 -delete

For some reasons, this line always works in small scale testings but not in real production runs under several loops. So we would love to make the "histogram.py" ignore any empty files if possible.

The search only finds this link which doesn't seem to be quite helpful : (

Ignoring empty files from coverage report

Could anyone kindly offer some comments? Thanks!

Community
  • 1
  • 1
Chubaka
  • 2,933
  • 7
  • 43
  • 58
  • Could you try [this](http://stackoverflow.com/questions/2507808/python-how-to-check-file-empty-or-not)? – Lucas Virgili Jul 10 '14 at 20:48
  • Your script works for me (that is, it doesn't hang or process output for the input file specified). Something else is going on. – tdelaney Jul 10 '14 at 20:57
  • 1
    [My experiments](http://pastebin.com/50H0wtpu) indicate that your hypothesis is incorrect. An empty file should cause an immediate normal termination of the program. – holdenweb Jul 10 '14 at 21:00

2 Answers2

2

Check if the input_file file is empty os.path.getsize(input_file) > 0

os.path.getsize

You will need the full path which I presume you will have and it will raise an error if the file does not exist or is inaccessible so you may want to handle those cases.

This code works, ignoring empty files:

#!/usr/bin/env python
import sys
import numpy as np
import matplotlib as mpl
import matplotlib.mlab as mlab
import os
mpl.use('Agg')
import matplotlib.pyplot as plt

sys.argv[1]  # Define input name
sys.argv[2]  # Define output name
sys.argv[3]  # Define title
input_file=sys.argv[1]
# Open the file name called "input_file"

if os.path.getsize(input_file) > 0:
    inp = open (input_file,"r")
    lines = inp.readlines()
    if len(lines) >= 20:
        x = []
        #numpoints = []
        for line in lines:
    #  if int(line) > -10000:  # Activate this line if you would like to filter any date (filter out values smaller than -10000 here)
                x.append(float(line))
    # the histogram of the data
                n, bins, patches = plt.hist(x, 50, normed=False, facecolor='gray')
                plt.xlabel('Differences')
                numpoints = len(lines)
                plt.ylabel('Frequency ( n =' + str(numpoints) + ' ) ' )
                title=sys.argv[3]
                plt.title(title)
                plt.grid(True)
                save_file=sys.argv[2]
                plt.savefig(save_file+".png")
                plt.clf()
    inp.close()

else:
    print "Empty file"


~$ python test.py empty.txt foo bar
   Empty file
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • How is this better than the method he is already using? Since his method should work there is something else going on and simply adding a different check may not help. – tdelaney Jul 10 '14 at 21:13
  • @tdelaney, the OP has said the problem happens when there is an empty file so checking for an empty file initially would seem logical to most people. "we would love to make the "histogram.py" ignore any empty files if possible" – Padraic Cunningham Jul 10 '14 at 21:16
  • I just add "os.path.getsize(input_file) > 0" right after "input_file=sys.argv[1]" and test with an empty file: "python histogram.py empty.csv empty empty". No warning is generated. May I know if any comment? Thanks. – Chubaka Jul 15 '14 at 07:53
  • I use python 2.7 by the way. – Chubaka Jul 15 '14 at 07:55
  • @Chubaka, yes. What are you doing in the code after if the file is empty? – Padraic Cunningham Jul 15 '14 at 09:57
  • Just print out a warning and bypass it : ) – Chubaka Jul 15 '14 at 18:28
  • so you have `if os.path.getsize(input_file) > 0:do something...` and if not don't continue? – Padraic Cunningham Jul 15 '14 at 18:39
-1

Check if the file exists + is not empty before hand.

import os
def emptyfile(filepath):
    return ((os.path.isfile(filepath) > 0) and (os.path.getsize(filepath) > 0))
wyas
  • 377
  • 2
  • 14
  • How does this check if the file as 20 or more lines? – tdelaney Jul 10 '14 at 20:59
  • It doesn't. This is only to check before hand if the file exists and to make sure it's not empty. OP will need to check if the file has 30 more lines later on. – wyas Jul 10 '14 at 21:01
  • Yeah, but that's what his script (which works when I test it) already does. – tdelaney Jul 10 '14 at 21:02
  • I didn't test the file, however I'm not sure I follow. OP says "However, if the file is empty, this python script will be freeze.", so the logical solution is to check if the file is empty before hand. – wyas Jul 10 '14 at 21:04
  • The OP already has the right checks for an empty file - there is something else going on. Throwing extra code at it doesn't solve the underlying problem. – tdelaney Jul 10 '14 at 21:09
  • Ah I see now. I didn't read the file initially, thanks for pointing it out. – wyas Jul 10 '14 at 21:18