I got some help in a previous question about this, however this is quite a different issue so I thought a new question would be best...
Once a month I need to parse through a very large CSV file, which I would normally do manually in Excel, however I'm now wanting to do it automatically in Python.
The CSV is structured like so:
[IDNUMBER],[DATE1],[DATE2],[DATE3],[STRING-OF-WHAT-HAPPENED],[DATE3 - DATE2 IN DAYS],[ORIGINAL-FILENAME]
What I need is essentially a printed out display of the following (or to a file, but I don't need to hold on to this data, I just plug it into some charts):
For each original-filename (which can be up to 1200 rows), I need an average of [DATE3 - DATE2 IN DAYS]. For example:
12345,2011-06-12,2011-07-01,2011-07-2,1,['1100.csv']
54321,2011-06-12,2011-07-01,2011-07-3,2,['1100.csv']
23452,2011-06-12,2011-07-01,2011-07-4,3,['1100.csv']
The average would be 2, and I'd need to know that number, and it would be helpful to know how many per file as well, which in this example would be 3.
Then move on to the next original-filename (the last item in the row), until the end of the CSV.
In excel I would use autofilter and select each listing in that column and just select the [date3 - date2] column and just get an average, but it's kind of tedious and time consuming.
Thanks!