0

so I need just a little bit of help with something I need to do. So I wrote some code which imports excel files form a directory and bins the files into bins of 5, [0,5), [5,10), etc. Nonetheless, everytime there is a number inside the bins the count goes up. Everything works great and it does what I need it to do, yet I was wondering if there was some kind of way that I could vary this bin width as I please, yet I'm having a bit of difficulty doing so. The code is:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import openpyxl
from pandas import ExcelWriter
import os

datadir = '/Users/user/Desktop/Data/'

for file in os.listdir(datadir):
   if file.endswith('.xlsx'):
   data = pd.read_excel(os.path.join(datadir, file))
   counts, bins, patches = plt.hist(data.values, bins=range(0, int(max(data.values)+5), 5))
   df = pd.DataFrame({'bin_leftedge': bins[:-1], 'count': counts})
   plt.title('Data')
   plt.xlabel('Neuron')
   plt.ylabel('# of Spikes')
   plt.show()

   outfile = os.path.join(datadir, file.replace('.xlsx', '_bins.xlsx'))
   writer = pd.ExcelWriter(outfile)
   df.to_excel(writer)
   writer.save()

So this creates a loop over all the files in the directory and bins them accordingly and exports them as individual, excel files with the results. I am actually pretty new to coding and would sure appreciate some help, any help would actually be greatly appreciated. Anyway, I was thinking of making the bin be a command line parameter which I could use to run the code standalone with a specific value of parameter or some other code could call it with some value based on its results. What would be the best way to go about this, please any help would be greatly appreciated.

J. Espino
  • 21
  • 1
  • 6

1 Answers1

0

The range function takes a step argument (see doc.). The step size will then correspond to the bin width.

Note: if you do not want to plot, but just compute the histogram you also might use numpy.histogramm.

Alternatively, you can define the bin width with the parameters bins (integer number instead of a sequence, n) and range (tuple indicating the bounds, x_min, x_max).

The bin witdth = (x_max - x_min) / n

If you want to fix the bin width you can use a bit of algebra and compute the number of bins given the interval of you input data and the width. (beware of variations induced by rounding and trunkating with integers)

felix the cat
  • 165
  • 2
  • 9
  • Thanks your answer was really helpful, but I'm actually in the need for a command line parameter to finish this,. I had actually thought about your way at the beginning, but I unfortunately need something that will run the code if the parameter is or isnt present in the code. Thank you so much – J. Espino Jul 05 '17 at 18:22
  • 1
    Within your script you can access command line arguments with `sys.argv`: https://stackoverflow.com/a/4118133/7306999 – Xukrao Jul 05 '17 at 21:45