0

Problem

I was trying to implement an web API(based on Flask), which would be used to query the database given some specific conditions, reconstruct the data and finally export the result to a .csv file.

Since the amount of data is really really huge, I can not construct the whole dataset and generate the .csv file all at once(e.g. create a DataFrame using pandas and finally call df.to_csv()), because that would cause a slow query and maybe the http connection would end up timeout.

So I create a generator which query the database 500 records per time and yield the result one by one, like:

def __generator(q):
    [...]    # some code here
    while True:
        if records == None:
            break
        records = q[offset:offset+limit] # q means a sqlalchemy query object
        [...]   # omit some reconstruct code
        
        for record in records:
            yield record

and finally construct a Response object, and send .csv to client side:

return Response(__generate(q), mimetype='text/csv')  # Flask

The generator works well and all data are encoded by 'uft-8', but when I try to open the .csv file using Microsoft Excel, it appears to be messy code.

Measures Already Tried

  • add a BOM header to the export file, doesn't work;

  • using some other encode like 'gb18030', and 'cp936', most of the messy code disappear, some still remained, and some part of the table structure become weird.

My Question Is

How can I make my code compatible to Microsoft Excel? That means at least two conditions should be satisfied:

  • no messy code, well displayed;

  • well structured table;

I would be really appreciated for your answer!

Community
  • 1
  • 1
streethacker
  • 308
  • 5
  • 13
  • Where are you formatting your records to use a CSV format? Excel opens CSV files with the system encoding, which is a pain as you won't know what encoding that is from a server. A UTF-8 "BOM" (not really, but it is the same Unicode codepoint used for UTF variants that actually have a byte order to mark) should help but perhaps your data isn't formatted correctly. – Martijn Pieters Jan 21 '15 at 14:55
  • 1
    What does ```messy code``` mean? – wwii Jan 21 '15 at 14:55

2 Answers2

1

By reading as text format for each column, it wont modify columns that it reads as different types like dates. Your code may be correct, and excel may just be modifying the data when it parses it as a csv - by importing as text format, it wont modify anything.

synonynome
  • 188
  • 1
  • 6
  • It worked, however, the operation appears to be a little complex(since I'm not the user of the cvs file, it would be sent to another department, they may think open a cvs file in this way a little bit difficult, compared to a xls file). I would prefer to export a xls file directly. Anyway, thanks for your help! – streethacker Jan 22 '15 at 06:00
0

I would recommend you look into xlutils. It's been around for quite some time, and our company has used it both for reading configuration files to run automated test and for generating reports of test results.

Jay Atkinson
  • 3,279
  • 2
  • 27
  • 41