1

I want to convert an xlsx file to TAB delimited csv using python. After reading I was pointed to library called openpyxl (code below)

def importXLSX(fileName):
        temp = os.path.basename(fileName).split('.xlsx')
        tempFileName = os.path.dirname(os.path.abspath(fileName))+"/TEMP_"+temp[0]+".csv"
        tempFile = open(tempFileName,'w')
        wb = load_workbook(filename=fileName)
        ws = wb.worksheets[0] #Get first worksheet
        for row in ws.rows: #Iterate over rows
                for cell in row:
                        cellValue = "" 
                        if cell.value is not None:
                                cellValue = cell.value
                        tempFile.write(cellValue+'\t')
                tempFile.write('\n')
        os.remove(fileName)
        return tempFileName

The input file I have contains billing data, but this function is converting $2,000 to 2000 and 0.00 to 0

Any idea why?

Syntax_Error
  • 5,964
  • 15
  • 53
  • 73

1 Answers1

2

This is because in Excel when you set the format for a cell to e.g. currency format so the value 2000 displays as $2000.00, you aren't changing the value in the cell, and openpyxl is reading the value in the cell, not its formatted/displayed presentation. if you had typed the string '$2000.00 in the cell then that's what openpyxl would see. Similarly for display format showing two decimal places, so 0 is displayed as 0.00, the value in the cell is still 0, so that's what openpyxl sees.

There are other questions on SO about reading cell formatting using the xlrd library instead of openpyxl: for example see How to get Excel cell properties in Python and Identifying Excel Sheet cell color code using XLRD package and in general google python excel read cell format

Community
  • 1
  • 1