1

When convert the properties to JSON it added extra backslash in ASCII character, How to avoid this, see the code below

Input File (sample.properties)

property.key.CHOOSE=\u9078\u629e

Code

import json
def convertPropertiesToJson(fileName, outputFileName, sep='=', comment_char='#'):
    props = {}
    with open(fileName, "r") as f:
        for line in f:
            l = line.strip()
            if l and not l.startswith(comment_char):
                innerProps = {}
                keyValueList = l.split(sep)
                key = keyValueList[0].strip()
                keyList = key.split('.')
                value = sep.join(keyValueList[1:]).strip()
                if keyList[1] not in props:
                    props[keyList[1]] = {}
                innerProps[keyList[2]] = value
                props[keyList[1]].update(innerProps)
    with open(outputFileName, 'w') as outfile:
        json.dump(props, outfile)

convertPropertiesToJson("sample.properties", "sample.json")

Output: (sample.json)

{"key": {"CHOOSE": "\\u9078\\u629e"}}

Expected Result:

{"key": {"CHOOSE": "\u9078\u629e"}}
Neo
  • 13
  • 1
  • 7
  • 2
    Are you sure this is an extra backslash? At first glance, I would have guessed that the two backlashes are needed to render a literal backslash before the `u` in `\u`. Otherwise, `\u` just escapes `u`, which probably just yields plain `u`. – Tim Biegeleisen Mar 16 '18 at 05:29
  • Yes, it does. He provided a minimal test case. – 0x01 Mar 16 '18 at 05:36
  • see this: https://stackoverflow.com/questions/49315872/how-to-convert-string-containing-unicode-escape-u-to-utf-8-string – Rahul Mar 16 '18 at 08:43

3 Answers3

2

The problem is the input is read as-is, and \u is copied literally as two characters. The easiest fix is probably this:

with open(fileName, "r", encoding='unicode-escape') as f:

This is will decode the escaped unicode characters.

VPfB
  • 14,927
  • 6
  • 41
  • 75
  • This! Use this solution. That's the answer that I wanted to give before, too, but it didn't work for me as I was erroneously running the code with python2 instead of python3 before. – 0x01 Mar 16 '18 at 07:38
  • I was also looking for answer. I asked similar question simply see: https://stackoverflow.com/questions/49315872/how-to-convert-string-containing-unicode-escape-u-to-utf-8-string – Rahul Mar 16 '18 at 08:43
0

The problem seems to be that you have saved unicode characters which are represented as escaped strings. You should decode them at some point.

Changing

l = line.strip()

to (for Python 2.x)

l = line.strip().decode('unicode-escape')

to (for Python 3.x)

l = line.strip().encode('ascii').decode('unicode-escape')

gives the desired output:

{"key": {"CHOOSE": "\u9078\u629e"}}
0x01
  • 468
  • 2
  • 9
0

I don't know solution to your problem but I found out where problem occurs.

with open('sample.properties', encoding='utf-8') as f:
    for line in f:
        print(line)
        print(repr(line))
        d = {}
        d['line'] = line
        print(d)

out:
property.key.CHOOSE=\u9078\u629e
'property.key.CHOOSE=\\u9078\\u629e'
{'line': 'property.key.CHOOSE=\\u9078\\u629e'}

I don't know how adding to dictionary adds repr of string.

Rahul
  • 10,830
  • 4
  • 53
  • 88