0

Am trying to convert csv to xlsx using python 2.7 version but it gives me error like

Traceback (most recent call last):
  File "C:/Users/XYZ/Desktop/converter.py", line 13, in <module>
    ws.cell(row=r, column=c).value = val
  File "C:\Python27\lib\openpyxl\cell\cell.py", line 291, in value
    self._bind_value(value)
  File "C:\Python27\lib\openpyxl\cell\cell.py", line 190, in _bind_value
    value = self.check_string(value)
  File "C:\Python27\lib\openpyxl\cell\cell.py", line 149, in check_string
    value = unicode(value, self.encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x93 in position 1037: invalid start byte

And My code is

import os
import glob
import csv
import openpyxl 
for csvfile in glob.glob(os.path.join('.', 'i.csv')):
    wb = openpyxl.Workbook()
    ws = wb.active
    with open(csvfile, 'rb') as f:
        reader = csv.reader(f)
        for r, row in enumerate(reader, start=1):
            for c, val in enumerate(row, start=1):
                ws.cell(row=r, column=c).value = val
    wb.save(csvfile + '.xlsx')

Below is the CSV File name i.csv AFFECTED PLATFORM Column Name and 1 row Cell data.

"•  Adobe Photoshop CC 2018 version 19.1.3 and earlier. <br />
•   Adobe Photoshop CC 2017 version 18.1.3 and earlier for mac OS.<br />
•   Adobe Photoshop CC 2017 version 18.1.2 and earlier for Windows.<br />
•   Adobe Acrobat DC Consumer version 2018.011.20038 and earlier.<br />
•   Adobe Acrobat Reader DC Consumer version 2018.011.20038 and earlier.    <br />
•   Adobe Acrobat 2017 Classic 2017 version 2017.011.30079 and earlier. <br />
•   Adobe Acrobat Reader 2017 Classic 2017 version 2017.011.30079 and earlier.  <br />
•   Adobe Acrobat DC Classic 2015 version 2015.006.30417 and earlier. <br />
•   Adobe Acrobat Reader DC Classic 2015 version 2015.006.30417 and earlier.<br />
"
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
Parul Garg
  • 102
  • 1
  • 10
  • check this topic, looks like someone had similar problem to solve:https://stackoverflow.com/questions/17684610/python-convert-csv-to-xlsx – krakowi Aug 23 '18 at 09:50
  • You could try using a [unicode CSV reader](https://pypi.org/project/unicodecsv/). Python 3.x has better support for this. – Martin Evans Aug 23 '18 at 15:13
  • I already tried these things but didn't work in PYTHON 2.7 . @MartinEvans – Parul Garg Aug 24 '18 at 04:18
  • I would suggest you edit your CSV file down to a few lines that is still able to recreate the problem. You could then add it to the question for us to be able to recreate the problem (or post a link to the whole file) – Martin Evans Aug 24 '18 at 10:36
  • Please Check Post I added 1 row data which creates the problem. @MartinEvans – Parul Garg Aug 28 '18 at 05:03
  • If I copy/paste the single cell you have provided into a file and run your script, it does not recreate your error. It creates a single cell excel file. Are you also hoping to somehow split this information onto multiple lines? – Martin Evans Aug 28 '18 at 12:53
  • add these characters in your CSV file whose Hex Values are 0x95 and 0x93. These are creating unicode error in my file while converting. @MartinEvans – Parul Garg Aug 29 '18 at 05:08

1 Answers1

0

You have possibly saved your files using a Windows encoding, as such the unicodecsv library could be used as follows:

import os
import glob
import unicodecsv as csv
import openpyxl 

for csvfile in glob.glob(os.path.join('.', 'i.csv')):
    wb = openpyxl.Workbook()
    ws = wb.active
    print csvfile
    with open(csvfile, 'rb') as f:
        reader = csv.reader(f, encoding='windows-1252')
        for r, row in enumerate(reader, start=1):
            for c, val in enumerate(row, start=1):
                ws.cell(row=r, column=c).value = val
    wb.save(csvfile + '.xlsx')

For the cell you have provided though, this would result in a single cell containing multiple lines. Further processing would be needed if you wanted to convert such entries into multiple cells. For that a better example CSV file would be needed.

The following could be used to remove the HTML <br /> tags for example:

ws.cell(row=r, column=c).value = val.replace('<br />', '')
Martin Evans
  • 45,791
  • 17
  • 81
  • 97