0

I'm writing JSON data with special characters (å, ä, ö) to file and then reading it back in. Then I use this data in a subprocess command. When using the read data I cannot make special characters get translated back to å, ä and ö respectively.

When running the python script below, the list "command" is printed as:

['cmd.exe', '-Name=M\xc3\xb6tley', '-Bike=H\xc3\xa4rley', '-Chef=B\xc3\xb6rk']

But I want it to be printed like this:

['cmd.exe', '-Name=Mötley', '-Bike=Härley', '-Chef=Börk']

Python Script:

# -*- coding: utf-8 -*-

import os, json, codecs, subprocess, sys


def loadJson(filename):
    with open(filename, 'r') as input:
        data = json.load(input)
    print 'Read json from: ' + filename
    return data

def writeJson(filename, data):
    with open(filename, 'w') as output:
        json.dump(data, output, sort_keys=True, indent=4, separators=(',', ': '))
    print 'Wrote json to: ' + filename



# Write JSON file
filename = os.path.join( os.path.dirname(__file__) , 'test.json' )
data = { "Name" : "Mötley", "Bike" : "Härley", "Chef" : "Börk" }
writeJson(filename, data)


# Load JSON data
loadedData = loadJson(filename)


# Build command
command = [ 'cmd.exe' ]

# Append arguments to command
arguments = []
arguments.append('-Name=' + loadedData['Name'] )
arguments.append('-Bike=' + loadedData['Bike'] )
arguments.append('-Chef=' + loadedData['Chef'] )
for arg in arguments:
    command.append(arg.encode('utf-8'))

# Print command (my problem; these do not contain the special characters)
print command

# Execute command
p = subprocess.Popen( command , stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

# Read stdout and print each new line
sys.stdout.flush()
for line in iter(p.stdout.readline, b''):
    sys.stdout.flush()
    print(">>> " + line.rstrip())
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
fredrik
  • 9,631
  • 16
  • 72
  • 132
  • print the strings in the list instead of the list and the special characters will magically reappear –  Oct 02 '13 at 13:34
  • 1
    `M\xc3\xb6tley` _is_ `Mötley`, encoded in utf8, just as you wrote. Your code is fine. – georg Oct 02 '13 at 13:44
  • @hop - Printing the list like that was just to illustrate that the values did not contain the åöä characters. It's in the subprocess.Popen where I get the real problems, as the arguments are not containing the åöä characters. – fredrik Oct 02 '13 at 13:56
  • possible duplicate of [Unicode in python](http://stackoverflow.com/questions/9867749/unicode-in-python) –  Oct 02 '13 at 13:58
  • @fredrik: a) you are wrong. b) your problem is probably windows. c) are you sure cmd.exe can handle utf-8? –  Oct 02 '13 at 14:01
  • @hop - you were right. I was wrong. The code is correct. Cmd.exe was just some executable I picked to illustrate my imaginary issue. Thank you. – fredrik Oct 02 '13 at 14:04
  • The strings _are_ utf-8 -- it just that `print command` is outputting the `repr()` of each member of the list. Try `for s in command:`, `print s` and it should look OK. – martineau Oct 02 '13 at 14:06
  • You can pretty print it with `print '[' + ', '.join("'"+elem+"'" for elem in command) + ']'`. – martineau Oct 02 '13 at 14:15

1 Answers1

3

This is the canonical representation of string constants in Python which is designed to eliminate encoding issues. Actually, it's what repr() on a string returns. List's str() function implementation that is called when it's printed calls repr() on its members to represent them.

The only way to output a string with non-ASCII characters as they are is to print it or otherwise write it to a stream. See Why does Python print unicode characters when the default encoding is ASCII? on how character conversion is done on printing. Also note that for non-ASCII 8-bit characters, the output will be different for terminals set up for different codepages.

Regarding the solution:

The simplest one will be to make an alternative str(list) implementation that will call str() instead of repr() - noting the warnings above.

def list_nativechars(l):
  assert isinstance(l,list)
  return "[" + ", ".join('"'+str(i)+'"' for i in l) + "]"

Now (in cp866 console encoding):

>>> l=["йцукен"]
>>> print list_nativechars(l)
["йцукен"]

With data in foreign encoding:

# encoding: cp858
<...>
l= ['cmd.exe', '-Name=Mötley', '-Bike=Härley', '-Chef=Börk']
print list_nativechars(l)

c:\>python t.py
["cmd.exe", "-Name=MФtley", "-Bike=HДrley", "-Chef=BФrk"]
Community
  • 1
  • 1
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152