1

I'm reading excel files and writing them out as csv. A couple of columns contain dates which are formatted as float number in excel. All those fields need to get converted to a proper datetime (dd/mm/YY) before I wrote to CSV. I found some good articles on how that works in general, but struggling to get that working for all rows in a opened sheet at once. (Newbie in Python)

Code looks like below for now:

wb = xlrd.open_workbook(args.inname)
    xl_sheet = wb.sheet_by_index(0)
    print args.inname
    print ('Retrieved worksheet: %s' % xl_sheet.name)
    print outname

    # TODO: Convert xldate.datetime from the date fileds to propper datetime

    output = open(outname, 'wb')
    wr = csv.writer(output, quoting=csv.QUOTE_ALL)

    for rownum in xrange(wb.sheet_by_index(0).nrows):
        wr.writerow(wb.sheet_by_index(0).row_values(rownum))

    output.close()

I'm sure i have to change the "for rownum ...." line but I'm struggling doing it. I tried several options, which all failed.

thanks

J Richard Snape
  • 20,116
  • 5
  • 51
  • 79
f0rd42
  • 1,429
  • 4
  • 19
  • 30

2 Answers2

3

You need to go through the row before you write it out to file, converting values. You are right to identify that it is near the for rownum line:

# You need to know which columns are dates before hand
# you can't get this from the "type" of the cell as they 
# are just like any other number

date_cols = [5,16,23]

... # Your existing setup code here #

# write the header row (in response to OP comment)
headerrow = wb.sheet_by_index(0).row_values(0)
wr.writerow(headerrow)

# convert and write the data rows (note range now starts from 1, not 0)
for rownum in xrange(1,wb.sheet_by_index(0).nrows):
    # Get the cell values and then convert the relevant ones before writing
    cell_values = wb.sheet_by_index(0).row_values(rownum)
    for col in date_cols:
        cell_values[col] = excel_time_to_string(cell_values[col])

    wr.writerow(cell_values)

Exactly what you put in your excel_time_to_string() function is up to you - the answer by @MarkRansom has a reasonable approach - or you could use the xlrd own package versions outlined in this answer.

For instance:

def excel_time_to_string(xltimeinput):
    return str(xlrd.xldate.xldate_as_datetime(xltimeinput, wb.datemode))

* EDIT *

In response to request for help in comments after trying. Here's a more error-proof version of excel_time_to_string()

def excel_time_to_string(xltimeinput):
    try:
        retVal = xlrd.xldate.xldate_as_datetime(xltimeinput, wb.datemode)
    except ValueError:
        print('You passed in an argument in that can not be translated to a datetime.')
        print('Will return original value and carry on')
        retVal = xltimeinput

    return retVal
Community
  • 1
  • 1
J Richard Snape
  • 20,116
  • 5
  • 51
  • 79
  • I cannot confirm this is working yet as I stumble across the problem that I obviously need to skip column 1 which contains the header. I get a "ValueError: invalid literal for int() with base 10: 'Vertragsbeginn'" Error – f0rd42 Feb 25 '15 at 13:58
  • 1
    yeah - sorry - should have put something in to skip row 1. That's what your error is telling you (trying to convert the string literal `'Vertragsbeginn'` won't work!). On the plus side - looks like you've got the column numbers right as Vertragsbeginn likely is a date unless my German is really bad. :) See my update. Note - you can put in conditions to skip more than one row just by altering the range that `rownum` iterates over. – J Richard Snape Feb 25 '15 at 14:05
  • Brilliant, I have found exactly the same code in the mean time on my own, glad I wan't too wring, because now I get: ValueError: invalid literal for int() with base 10: ''. I believe this is because not all cells actually contain a value, i.e. if there is no "contract end date" defined yet, the cell is blank – f0rd42 Feb 25 '15 at 15:30
  • 1
    You need to include an `if` test for blank cells - either in the `for` loop, or in whatever `excel_time_to_string()` function you've gone with. You could do this in a simple way (e.g. `if value != '':`), or do something more comprehensive like [put the translation code in a try-catch block](https://docs.python.org/2/tutorial/errors.html) with `except ValueError:` and make your code simply use the raw value if it can't do the translation. I can put the code in the answer if you want, but it might help you more to work out how yourself - let me know. – J Richard Snape Feb 25 '15 at 15:48
  • I tried several possible combinations now without any luck. As mentioned once or twice: I'm a complete newbie to programming (not just to python) and therefor a better editor than author :-) I'd appreciate some "additional" hints. I like the idea of finding it out on my own, which is what I prefer all the time it's just a bit of a nightmare meanwhile with this script. I guess you know what I mean .-) thanks – f0rd42 Feb 25 '15 at 16:59
  • No probs - we're here to learn. I'll add a nicer example for `excel_time_to_string()` so you can see what the error checking might look like. – J Richard Snape Feb 25 '15 at 17:04
2

The conversion from Excel to Python is quite simple:

>>> excel_time = 42054.441953
>>> datetime.datetime(1899,12,30) + datetime.timedelta(days=excel_time)
datetime.datetime(2015, 2, 19, 10, 36, 24, 739200)

Or to do the complete conversion to a string:

def excel_time_to_string(excel_time, fmt='%Y-%m-%d %H:%M:%S'):
    dt = datetime.datetime(1899,12,30) + datetime.timedelta(days=excel_time)
    return dt.strftime(fmt)

>>> excel_time_to_string(42054.441953)
'2015-02-19 10:36:24'
>>> excel_time_to_string(42054.441953, '%d/%m/%y')
'19/02/15'
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • @Marc, I know, that is what I have found as well amoung others, but I'm struggling to get that into the code mentioned above (i.e. doing the "magic" on each date filed found in every row). As mentioned, I'm a python newbie. Thanks – f0rd42 Feb 19 '15 at 18:18
  • 1
    @AndreDieball as mentioned in the `xlrd` documentation, there's no separate type for Excel dates, they're just numbers. The example I gave you is how to convert a floating point number into a Python `datetime`. I've added the code to convert it to a string. – Mark Ransom Feb 19 '15 at 21:26