0

I have a JSON file that needs to be sent. Before sending I need to do a validity check and replace some special characters (spaces and dots(.)).

The problem is that Python inserts u character before each of my strings, which can't be read by the server. How do I remove the u character and do the data sanitation (character replacement)?

Original JSON

{
    "columns": [
        {
            "data": "Doc.",
            "title": "Doc."
        },
        {
            "data": "Order no.",
            "title": "Order no."
        },
        {
            "data": "Nothing",
            "title": "Nothing"
        }
    ],
    "data": [
        {
            "Doc.": "564251422",
            "Nothing": 0.0,
            "Order no.": "56421"
        },
        {
            "Doc.": "546546545",
            "Nothing": 0.0,
            "Order no.": "98745"
        }
    ]
}

Python:

import json
def func():
    with open('json/simpledata.json', 'r') as json_file:
        json_data = json.load(json_file)
        print(json_data)
func()

Output JSON:

{u'data': [{u'Nothing': 0.0, u'Order no.': u'56421', u'Doc.': u'564251422'}, {u'Nothing': 0.0, u'Order no.': u'98745', u'Doc.': u'546546545'}], u'columns': [{u'data': u'Doc.', u'title': u'Doc.'}, {u'data': u'Order no.', u'title': u'Order no.'}, {u'data': u'Nothing', u'title': u'Nothing'}]}

What I'm trying to achieve in Python:

    sanitizeData: function(jsonArray) {
        var newKey;
        jsonArray.forEach(function(item) {
            for (key in item) {
                newKey = key.replace(/\s/g, '').replace(/\./g, '');
                if (key != newKey) {
                    item[newKey] = item[key];
                    delete item[key];
                }
            }
        })
        return jsonArray;
    },
    # remove whitespace and dots from data : <propName> references
    sanitizeColumns: function(jsonArray) {
        var dataProp = [];
        jsonArray.forEach(function(item) {
            dataProp = item['data'].replace(/\s/g, '').replace(/\./g, '');
            item['data'] = dataProp;
        })
        return jsonArray;
    }
Peter G.
  • 7,816
  • 20
  • 80
  • 154
  • that just means that the strings are unicode strings, I don't think those are actually there in the data – R Nar Nov 24 '15 at 17:45
  • my browser (Chrome) interprets them as such and also they not accepted by the server, while JSON without the `u` characters is accepted normally – Peter G. Nov 24 '15 at 17:48
  • 1
    Possible duplicate of [how to python prettyprint a json file](http://stackoverflow.com/questions/12943819/how-to-python-prettyprint-a-json-file) – muddyfish Nov 24 '15 at 17:49
  • I was working on a solution for the 2nd part, which got removed, but you might wanna take a look anyway at my edit – Felk Nov 24 '15 at 17:58
  • the removed part is back, if you want to take a look on it. it was also part of the original question, so I'm putting it back there. – Peter G. Nov 24 '15 at 18:02

4 Answers4

3

To properly print the JSON as a string, try print(json.dumps(json_data))

See also https://docs.python.org/3/library/json.html#json.dumps

For removing certain characters from a string you can do the obvious thing:

string = string.replace(".", "").replace(" ", "")

or, more efficiently, use str.translate (the example only works for python 2, see this answer on how to use str.translate for your usecase in python 3):

string = string.translate(None, " .")

or with regular expressions; re.sub:

import re
string = re.sub(r"[ .]", "", string)

And then just use a nice comprehension to go over the whole dictionary (use iteritems() with python 2):

def sanitize(s):
    return re.sub(r"[ .]", "", s)
table = {sanitize(k): sanitize(v) for k, v in table.items()}

But this only works on a swallow dictionary. It doesn't look like your solution works on a deeply nested structure as well though. But if you need that, how about some recursion (for python 2 use iteritems() instead of items() and basestring instead of str):

def sanitize(value):
    if isinstance(value, dict):
        value = {sanitize(k): sanitize(v) for k, v in value.items()}
    elif isinstance(value, list):
        value = [sanitize(v) for v in value]
    elif isinstance(value, str):
        value = re.sub(r"[ .]", "", value)
    return value
table = sanitize(table)
Felk
  • 7,720
  • 2
  • 35
  • 65
2

I just wanted to add a version to the excellent solution af @Felk. I had a bunch of keys that had dots in them. The solution from @Felk removed the dots from the keys, but also from the values - which I did not want. So for anyone - like me - entering this post for a solution that only sanitites the keys, here it is.

def sanitize(value, is_value=True):
    if isinstance(value, dict):
        value = {sanitize(k,False):sanitize(v,True) for k, v in value.items()}
    elif isinstance(value, list):
        value = [sanitize(v, True) for v in value]
    elif isinstance(value, str):
        if not is_value:
            value = re.sub(r"[.]", "", value)
    return value

table = sanitize(table)
jlaur
  • 740
  • 5
  • 13
  • Made an improvement to your answer allowing to remove special control characters that would make some APIs break when sending json strings, see below. In the meantime, thanks for your work :) – Orsiris de Jong Oct 28 '20 at 14:33
1

I too want to improve the excellent solution from @Felk and @jlaur.

In my case, Windows Eventlogs containted unknown control characters, which weren't santizied correctly.

Here's my version which removes all abstract control characters, compatible with Python 3.6+ because of the typing hints (can be removed to make it python 3.x compatible again).

import re
from typing import Union

def json_sanitize(value: Union[str, dict, list], is_value=True) -> Union[str, dict, list]:
    """
    Modified version of https://stackoverflow.com/a/45526935/2635443

    Recursive function that allows to remove any special characters from json, especially unknown control characters
    """
    if isinstance(value, dict):
        value = {json_sanitize(k, False):json_sanitize(v, True) for k, v in value.items()}
    elif isinstance(value, list):
        value = [json_sanitize(v, True) for v in value]
    elif isinstance(value, str):
        if not is_value:
            # Remove dots from value names
            value = re.sub(r"[.]", "", value)
        else:
            # Remove all control characters
            value = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', value)
    return value
Orsiris de Jong
  • 2,819
  • 1
  • 26
  • 48
-1

example:

 import json

 json_d = json.load(open('data.json', 'r'))
 json_d = json.dumps(json_d)
 print(json_d)
vatay
  • 387
  • 2
  • 9