1

I'm currently writing a python script that will take an arbitrary number of csv files and create .xls files from them. Unfortunately, some of these csv files have row counts greater than 65536, which means that they can't exist on one .xls sheet. What I would like to do is come up with a way to generate a new sheet when that number of rows is reached. For reference, here is the code I'm currently using:

import csv, xlwt, glob, ntpath

files = glob.glob("C:/Users/waldiesamuel/326/*.csv")
bold = xlwt.easyxf('font: bold on')

for i in files:
    org_file = open(i, 'r')
    reader = csv.reader((org_file), delimiter=",")
    workbook = xlwt.Workbook()
    sheet = workbook.add_sheet("SQL Results")

    path = ntpath.dirname(i)
    file = ntpath.basename(i)

    for rowi, row in enumerate(reader):

        for coli, value in enumerate(row):
            if coli == 0:
                sheet.write(rowi,coli,value,bold)
            else:
                sheet.write(rowi,coli,value)

    workbook.save(path + file + '.xls')

My thought is that around

for rowi, row in enumerate(reader):

I could use an if statement to check if row is greater than 65536, but I'm not sure how to create a new variable from there.

Edit:

I found a potential solution, which failed, and was explained by the answer. I'm including it here as an edit so everyone can follow the thought process:

So it appears that because xlwt checks to specifically make sure you're not adding more than 65536 rows, this might not be doable. I had come up with what I thought was a clever solution, by changing my sheet variable to a dict, like so:

sheet = {1: workbook.add_sheet("SQL Results")}

then initializing two variables to serve as counters:

sheet_counter = 1
dict_counter = 2

and then using that for a conditional within the first for loop that would reset the row index and allow xlwt to continue writing to a new sheet:

if rowi == 65536:
    sheet[dict_counter] = workbook.add_sheet("SQL Results (" + str(dict_counter) + ")")
    sheet_counter += 1
    dict_counter += 1
    rowi = 1
else:
    pass

Unfortunately, even doing so still causes xlwt to throw the following error when the row variable increments beyond 65536:

Traceback (most recent call last):
  File "xlstest.py", line 35, in <module>
    sheet[sheet_counter].write(rowi,coli,value,bold)
  File "C:\Users\waldiesamuel\AppData\Local\Programs\Python\Python35-32\lib\site-packages\xlwt\Worksheet.py", line 1088, in write
    self.row(r).write(c, label, style)
  File "C:\Users\waldiesamuel\AppData\Local\Programs\Python\Python35-32\lib\site-packages\xlwt\Worksheet.py", line 1142, in row
    self.__rows[indx] = self.Row(indx, self)
  File "C:\Users\waldiesamuel\AppData\Local\Programs\Python\Python35-32\lib\site-packages\xlwt\Row.py", line 43, in __init__
    raise ValueError("row index was %r, not allowed by .xls format" % rowx)
ValueError: row index was 65537, not allowed by .xls format
Sam Waldie
  • 25
  • 7
  • 1
    See my updated answer. You original code for adding a sheet will only add one sheet when `rowi == 65536`, and then rowi was reset to 1, but of course that's not happening any more, so you could use modulo here as well so that a sheet is added for every multiple of 65536. You could, of course use the already calculated `rowno` if you change the order a little. – SiHa Aug 24 '16 at 14:00

2 Answers2

1

xlwt is

a library for developers to use to generate spreadsheet files compatible with Microsoft Excel versions 95 to 2003. (see here)

In those excel versions the maximal number of rows is limited by 65536. See here.

Try XlsxWriter which is compliant with Excel 2007 and number of rows can be up to 1,048,576.

Community
  • 1
  • 1
Tom Ron
  • 5,906
  • 3
  • 22
  • 38
  • The problem here is that one of my constraints doing this is that I need to use xls. I'm aware that the constraint exists, what I'm trying to do is within the python, create a new worksheet when the row index reaches that value, then continue writing to the new worksheet – Sam Waldie Aug 22 '16 at 20:03
1

The problem with your solution is that you are trying to reset rowi (which comes from your enumerate() statement) back to 1, but it is reset on the next loop.

The easiest way to achieve what you want, I think, is to change the way you reference rows and sheets. You can use the floor division and modulo operators to give you the sheet number and row numbers respectively.

if rowi % 65536 == 0:
    sheet[dict_counter] = workbook.add_sheet("SQL Results (" + str(dict_counter) + ")")
    sheet_counter += 1 # Not sure if you use this anywhere else - it can probably go
    dict_counter += 1
else:
    pass

sheetno = rowi // 65536
rowno = rowi %% 65536
sheet[sheetno].write(rowno,coli,value,bold)
Community
  • 1
  • 1
SiHa
  • 7,830
  • 13
  • 34
  • 43