2

My code currently looks like this. The conversion of xls to csv part works but not the writing to HTML.

import xlrd
import csv
import sys

# write from xls file to csv file
wb = xlrd.open_workbook('your_workbook.xls')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('your_csv_file.csv', 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)

for rownum in xrange(sh.nrows):
    wr.writerow(sh.row_values(rownum))

your_csv_file.close()
print "Converted from xls to csv!"
# write from csv file to html 

# if len(sys.argv) < 3:
#   print "Usage: csvToTable.py csv_file html_file"
#   exit(1)

# Open the CSV file for reading
reader = csv.reader(open("your_csv_file.csv"))

# Create the HTML file for output
htmlfile = open("data.html","w+")

# initialize rownum variable
rownum = 0

# generate table contents
for row in reader: # Read a single row from the CSV file
    for line in htmlfile:
        # this HTML comment is found in the HTML file where I want to insert the table
        if line == "<!-- Table starts here !-->":
            # write <table> tag
            htmlfile.write('<table>')
            htmlfile.write('<tr>') # write <tr> tag
            for column in row:
                htmlfile.write('<th>' + column + '</th>')
            htmlfile.write('</tr>')
            # write </table> tag
            htmlfile.write('</table>')

        #increment row count    
        rownum += 1



# print results to shell
print "Created " + str(rownum) + " row table."
exit(0)

The output is just a blank page as the program can't find the

<!-- Table starts here !-->
  • 1
    You appear to be writing a lot from scratch, all of this functionality already exists in an awesome library called pandas: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.to_html.html – Jorick Spitzen Jun 16 '15 at 08:07

4 Answers4

1

Try to change read mode from "w+" to "a+":

htmlfile = open("data.html", "a+")

When you open the file data.html with w+ mode it is truncated, and then when you reading lines for line in htmlfile: you will not find "<!-- Table starts here !-->" HTML comment.

Also add line.strip() to read your line without newline at the end of the string:

if line.strip() == "<!-- Table starts here !-->":

I would recommend you to separate HTML file read and write. For example you could change your code as:

out_lines = []
with open('data.html', 'r') as htmlfile:
    # read lines once, and scan for HTML comment for each row
    lines = htmlfile.readlines()
    # generate table contents
    for row in reader: # Read a single row from the CSV file
        for line in lines:
            # this HTML comment is found in the HTML file where I want to insert the table
            if line.strip() == "<!-- Table starts here !-->":
                # write <table> tag
                out_lines.append('<table>')
                out_lines.append('<tr>') # write <tr> tag
                for column in row:
                    out_lines.append('<th>' + column + '</th>')
                out_lines.append('</tr>')
                # write </table> tag
                out_lines.append('</table>')
            # increment row count    
            rownum += 1

# update your html file
with open('data.html', 'a') as f:
    f.write('\n'.join(out_lines))
Delimitry
  • 2,987
  • 4
  • 30
  • 39
1

Like Delimitry said, your read mode is not right:

w+ : Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

So the first thing it does is truncate (empty) the entire file.

Jorick Spitzen
  • 1,559
  • 1
  • 13
  • 25
0

The lines you read from htmlfile include a trailing newline. You must strip it before comparing:

if line.strip() == "<!-- Table starts here !-->":

Hint:

HTML comments only have a ! at the beginning, not at the end. It is not forbidden to write

<!-- Table starts here !-->
-----------------------^

But the second ! is very uncommon.

0

You have two or three problems here. I'll go through them one by one, but first I want to say that I would perform this task using the Pandas library. It does far, far more than this kind of task, but if you did install it, all you would have to do to get the data into table format is:

import pandas as pd
xls = pd.ExcelFile('path_to_file.xls')
df = xls.parse('Sheet1') # parse the sheet you're interested in - results in a Dataframe
table_html = df.to_html()

You now have a string (table_html) of the entire data in html <table> format that you can write directly into your html file. No intermediate csv stage or anything. The documentation is available for pandas.ExcelFile.parse and pandas.DataFrame.to_html()


Problems with existing solution

1. String comparison

You are looking for the comment line to replace with your html - you are using == to compare two strings. Unless you're absolutely sure that the strings will be exactly the same - no extra whitespace, no end of lines, no extra punctuation etc - then this is often error prone.

You could strip() the line to get rid of whitespace and then use == as others have suggested. Personally I'd be tempted to be more permissive and use the in keyword something like:

if '<!-- Table starts here' in line:

Then it doesn't matter about whether the latter ! is in the string, or whitespace before or after the text etc. You might be even more permissive and use a regular expression such that you can have any whitespace between the comment marker and the text. You will probably know how precise the string will be in the .html file that you're working with.

2. Reading and writing the .html file concurrently

You're trying to insert text in the middle of a file. There's a Q&A covering methods how to that. In brief, in your case (relatively small data i.e. one .html file) I would read all the lines into a list and then insert the table HTMLat the point you want e.g.

content = []
insert_index = None
with open('data.html', 'r') as htmlfile:
    for line in htmlfile:
        content.append(line)
        if '<!-- Table starts here' in line:
            insert_index = len(content)

if insert_index:
    content.insert(insert_index, table_html)

Note I'm assuming you've got table_html using the Pandas method at the start. If you don't want to do that for some reason and still want to get the content via csv, you can always build up table_html by creating an empty string and then adding on all the HTML elements in a similar way to how your loop does it now.

3. Writing the html

Others have noted that you could use the append mode of file opening, rather than the write mode. This is fine, but if you use the method above to read all the content into a list and insert within the list, you can then simply do:

with open('data.html', 'w+') as f:
    f.write('\n'.join(content))
        
Community
  • 1
  • 1
J Richard Snape
  • 20,116
  • 5
  • 51
  • 79
  • 1
    Thanks a lot! Answers like this are what make this site so awesome. One thing though, how would I make it so that a new spreadsheet's values would write over the old table in the HTML, assuming the rows and columns formatting are the same? – themostenjoyableday Jun 16 '15 at 13:32
  • Ooof - that's a different (and much harder) question. The way I'd do it is to put an additional comment marker at the end (i.e. after inserting `table_html`) and then adapt your code so that when it detects the table start marker, it doesn't add to `content` until it detects the table end marker. Everything else would stay the same. Have a go at that - if you hit further problems it's time for a new StackOverflow question :) You can link to it here, of course, but it's too big a question to fully explain in comments / edit into the existing one. – J Richard Snape Jun 16 '15 at 14:28