Optimizing excel data collection/reduction using xlrd/xlwt and loop iteration

Question

I have just recently started coding in Python and have a great deal to learn. The goal of my code is to pull a string from a cell, check its character length and replace words with specific abbreviations. Then I write the new string into a different excel sheet and save once all of the data has been reduced. I finally figured out how to get it to work, but it does take a long time. I am working with 10,000 plus cells of strings and my loop iteration is probably far from optimized. If you have any information that would help that would be great.

import xlwt
import xlrd

book = xlrd.open_workbook() # opens excel file for data input
reduc = xlwt.Workbook()     # creates the workbook that the reduced data will be saved in

# Calls the sheets I will be working with
Data = book.sheet_by_index(3)
Table = book.sheet_by_index(5)
sheet1 = reduc.add_sheet("sheet 1")

# the initial loop pulls the string from excel

for x in xrange(30): # I use a limited range for debugging
    From = str(Data.col(15)[x].value)
    To = str(Data.col(16)[x].value)
    print x # I just print this to let me know that i'm not stuck

    if len(From) <= 30 and len(To) <= 30:
        sheet1.write(x, 3, From)
        sheet1.write(x, 4, To)
    else:
        while len(From) > 30 or len(To) > 30:
            for y in xrange(Table.nrows):
                word = str(Table.col(0)[y].value)
                abbrv = str(Table.col(1)[y].value)
                if len(From) > 30:
                    From = From.replace(word, abbrv)
                if len (To) > 30:
                    To = To.replace(word, abbrv)
            sheet1.write(x, 3, From)
            sheet1.write(x, 4, To)
            break

reduc.save("newdoc.xls")
print " DONE!

Below is my updated code. It is almost instant which is what I expected. I pre-loaded all the columns that I wanted and then ran it though the same loop system. I then stored instead of wrote the data to the new excel file. After all of the data has been reduced I saved each cell in a separate for loop. Thanks for the suggestions guys.

import xlwt
import xlrd

# Workbook must be located in the Python27 folder in the C:/directory
book = xlrd.open_workbook() # opens exel file for data input

# Calls the sheets I will be working with
Data = book.sheet_by_index(0)
Table = book.sheet_by_index(1)

# Import column data from excel
From = Data.col_values(15)
To = Data.col_values(16)
word = Table.col_values(0)
abbrv = Table.col_values(1)

# Empty variables to be filled with reduced string
From_r = []
To_r = []

# Notes to be added 
for x in xrange(Data.nrows):
    if len(From[x]) <= 28 and len(To[x]) <= 28:
        From_r.append(From[x])
        To_r.append(To[x])
    else:
        while len(From[x]) > 28 or len(To[x]) > 28:
            for y in xrange(Table.nrows):
                if len(From[x]) > 28:
                    From[x] = From[x].replace(word[y], abbrv[y])
                if len (To[x]) > 28:
                    To[x] = To[x].replace(word[y], abbrv[y])
            From_r.append(From[x])
            To_r.append(To[x])
            break

# Create new excel file to write reduced strings into
reduc = xlwt.Workbook()
sheet1 = reduc.add_sheet("sheet 1")

# Itterate through list to write each object into excel
for i in xrange(Data.nrows):
    sheet1.write(i, 3, From_r[i])
    sheet1.write(i, 4, To_r[i])

# Save reduced string in new excel file
reduc.save("lucky.xls")
print " DONE! "

score 1 · Accepted Answer · edited May 23 '17 at 11:58

The slowness is probably becouse of an inefficient replacement code. You should try and load inn all the words and corresponding abbreviations, unless the list is so large you'll run out of memory. And then for even more speed up you could replace against all words on one go.

Do this and move it out of the loop

words = [str(cell.value) for cell in Table.col(0)] #list comprehension
abbr = [str(cell.value) for cell in Table.col(1)]
replacements = zip(words, abbr)

This function from here use the regular expression module for replacing against all matches in a given list.

import re
def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]))
    return lambda string: pattern.sub(replacement_function, string)

To use it do this:

replaceFunc = multiple_replacer(*replacements) #constructs the function. Do this outside the loop, after the replacements have been gathered.
myString = replaceFunc(myString)

Thank you for your input, I will load in all the words and abbrvs. The only problem with the replacing against all matches is I am just trying to get the strings to a specific length. The strings will be used for indexing a system and I want it to be as easy to read while still fitting in a specific box. — user3081146, Dec 09 '13 at 02:56
You could use a different approach like splitting the strings and replacing terms left-to-right or right-to-left until the total length is short enough, but it might be slower than the regular expression answer. Personally I suspect it would be good enough, since you still wouldn't have the double nesting that makes this so time consuming and tens of thousands really isn't that much, but you'd have to try it out and see. If that sounds like something you'd like to test out I can add an answer showing the details. — Peter DeGlopper, Dec 09 '13 at 05:37

Optimizing excel data collection/reduction using xlrd/xlwt and loop iteration

1 Answers1