I am using the unicodecsv drop-in module for Python 2.7 to read a CSV file containing columns of words in 28 different languages, some of which are accented and/or utilise completely different alphabet/character systems. I am loading the CSV
with open(sourceFile, 'rU') as keywordCSV:
keywordList = csv.reader(keywordCSV, encoding='utf-8-sig', dialect=csv.excel)
but reading from keywordList
is currently producing unicode escape characters/sequences rather than the native character symbols. Whilst this is not ideal (ideally I would be able to load the unicode in the csv as native character symbols from the start), it is acceptable so long as I can convert these into native character symbols later on in the script (when exporting to whichever file type will make this easiest). How is this, or preferably the ideal case, done? I have tried using workarounds such as these to no avail, and I am still not sure if this is an interpreter issue or an encoding issue within the script.
The reason I have used utf-8-sig
when reading the file is that not doing so was resulting in a (BOM
)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 155:
but this has now stopped happening for reasons unbeknown to me. Similarly, I am using 'rU'
when opening the file as not doing so produces a
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
but I am not sure if either of these are appropriate.
In this question, printing each character one by one results in the native characters being printed (something that also works in my code when run from the terminal), is there are a way of iterating through the characters and converting each one to its native character?
Apologies for posting another question on this already saturated topic, but I haven't been able to get other people's suggestions working for this case. Perhaps I have been looking in the wrong place in trying to decode the encoded csv output at the end of the script, and rather the problem is in my csv.reader
's encoding
. Any help will be very much appreciated.