Valid JSON Load in Python file

Question

Running into a problem with my JSON:

First issue was that SyntaxError: Non-ASCII character '\xe2' in file so I added # -*- coding: utf-8 -*- at the top of my file.

Then the problem became a problem where I load my JSON x = json.loads(x): ValueError: Expecting , delimiter: line 3 column 52 (char 57). I referenced this stackoverflow solution and so added an r in front of my JSON:

x = r"""[
  { my validated json... }
]"""

But then I get an error TypeError: sequence item 3: expected string or Unicode, NoneType found - I think it that the r is throwing it off somehow?

JSON Resembles the following:

[
  {
    "brief": "Brief 1",
    "description": "Description 1",
    "photos": [
      "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010",
      "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010",
      "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010"
    ],
    "price": "145",
    "tags": [
      "tag1",
      "tag2",
      "tag3"
    ],
    "title": "Title 1"
  },
  {
    "brief": "Brief 2",
    "description": "Description 2",
    "photos": [
      "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010",
      "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010"
    ],
    "price": "150",
    "tags": [
      "tag4",
      "tag5",
      "tag6",
      "tag7",
      "tag8"
    ],
    "title": "Title 2"
  },{
    "brief": "blah blah 5'0\" to 5'4\"",
    "buyerPickup": true,
    "condition": "Good",
    "coverShipping": false,    
    "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n  \r\n\r\n",
    "photos": [
      "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111",
      "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111"
    ],
    "price": "240",
    "tags": [
      "tag2",
      "5'0\"-5'4\""
    ],
    "title": "blah blah 17\" Frame",
    "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111"    
  } 
]

CURRENT CODE

# -*- coding: utf-8 -*-

import csv
import json

x = """[
      {
        "brief": "Brief 1",
        "description": "Description 1",
        "photos": [
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010"
        ],
        "price": "145",
        "tags": [
          "tag1",
          "tag2",
          "tag3"
        ],
        "title": "Title 1"
      },
      {
        "brief": "Brief 2",
        "description": "Description 2",
        "photos": [
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010"
        ],
        "price": "150",
        "tags": [
          "tag4",
          "tag5",
          "tag6",
          "tag7",
          "tag8"
        ],
        "title": "Title 2"
      },{
        "brief": "blah blah 5'0\" to 5'4\"",
        "buyerPickup": true,
        "condition": "Good",
        "coverShipping": false,    
        "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n  \r\n\r\n",
        "photos": [
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111"
        ],
        "price": "240",
        "tags": [
          "tag2",
          "5'0\"-5'4\""
        ],
        "title": "blah blah 17\" Frame",
        "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111"    
      } 
    ]"""

x = json.loads(x)

f = csv.writer(open("example.csv", "wb+"))

f.writerow(["Handle","Title","Body (HTML)", "Vendor","Type","Tags","Published","Option1 Name","Option1 Value","Variant Inventory Qty","Variant Inventory Policy","Variant Fulfillment Service","Variant Price","Variant Requires Shipping","Variant Taxable","Image Src"])

    for x in x:

        allTags = "\"" + ','.join(x["tags"]) + "\""
        images = x["photos"]
        f.writerow([x["title"],
                    x["title"],
                    x["description"],
                    "Vendor Name",
                    "Widget",
                    allTags,
                    "TRUE",
                    "Title",
                    "Default Title",
                    "1",
                    "deny",
                    "manual",
                    x["price"],
                    "TRUE",
                    "TRUE",
                    images.pop(0) if images else None])
        while images:
            f.writerow([x["title"],None,None,None,None,None,None,None,None,None,None,None,None,None,None,images.pop(0)])

ERROR MESSAGE: Full traceback that I see: Traceback (most recent call last):

Traceback (most recent call last): File "runnit2.py", line 976, in <module> allTags = "\"" + ','.join(x["tags"]) + "\"" TypeError: sequence item 3: expected string or Unicode, NoneType found

UPDATE: I've identified that the data, specifically [x["title"], x["title"],x["description"], has some characters that the code doesn't like. 'ascii' codec can't encode character u'\u201d' in position 9: ordinal not in range(128). I've done a quick fix with x["description"].encode('utf-8'), etc., but it pretty much eliminates everything that's in that cell. Is there is a better way which doesn't delete everything after offending character?

Can you also add your JSON? Errors suggest the problem might be there. — hgazibara, Jul 06 '18 at 18:57
More generally, we need a [mcve] before we can debug your problem. With what you've given us, all anyone can see is that there must be something wrong somewhere in some of your code or data, which isn't going to help you very much. — abarnert, Jul 06 '18 at 19:01
@hgazibara I've included a close example of the JSON, which I think is representative of the most convoluted parts of the JSON with strange characters etc. — maudulus, Jul 06 '18 at 19:23
Can't reproduce. That loads fine for me (python 3.6.2). What version of Python are you using? — glibdud, Jul 06 '18 at 19:26
Loads fine for me in 2.7.13 as well (once the `r` is added before the JSON string). — glibdud, Jul 06 '18 at 19:29
When I run the code as posted, I get an error about the `"blah blah 5'0` line, which makes sense because `\"` in a non-raw Python string literal is just a `"`, not a backslash-escaped quote. When I add the `r` prefix, as you say you did, the code works, both [in Python 3](https://repl.it/repls/ElderlyGoodDefinition) and [in Python 2](https://repl.it/repls/OpenLargeNasm). Please post code that actually demonstrates the exception you want help with. — abarnert, Jul 06 '18 at 20:13
Also, please post the full traceback, not just the exception description. I can't see anything in your posted code that could possibly generate that exception—but I could be wrong about that, and if I am, showing us the exception would tell us where I'm wrong. — abarnert, Jul 06 '18 at 20:13
That error message looks like the one you'd get from `str.join` when you pass it a sequence like [`'abc', 'def', 'ghi', None]`. So, if I had to take a wild guess: even though you're creating a `csv.writer`, you're also trying to create rows manually with a `','.join(something)`. And that `something` is a list containing values that you built out of lookups like `x[i].get(key)`. And one of those `get` calls returned `None` because there was no such key in one of your dicts. — abarnert, Jul 06 '18 at 20:19
@abarnert I think you might be right - I've posted the full example - do you think this could be the issue? — maudulus, Jul 06 '18 at 20:27
I'm running the exact code that I've posted and am getting the issue, with Python 2.7.10. I'm adding the full traceback to my post — maudulus, Jul 06 '18 at 20:33
The traceback that you just posted is the one that you say you already fixed by adding an `r` prefix to the JSON string literal. If that's your only problem, then your question is just a dup of the answer you already found and linked to in your question. If you want help with the _other_ problem, the one you still have, then edit your question to be about that problem: include the `r` in your code, and give us the traceback of the other exception. — abarnert, Jul 06 '18 at 20:36
Also, you can't be running the exact code you posted, because the code you posted raises an `IndentationError` before anything even executes. — abarnert, Jul 06 '18 at 20:39
Sorry, I still can't reproduce the problem. Again, the code you've posted here will not run because it still has the `r` problem, and the Unicode character problem, and the indentation error. But if I fix all of that, [it works](https://repl.it/repls/CoarseOverjoyedObjectdatabase). — abarnert, Jul 06 '18 at 20:59
I can't post all of my data online. I've been trying to anonymize it but it's too much data. Is there some sort of tool you use for this type of thing? I copied your exact example and just put in my data and I'm getting the error `SyntaxError: Non-ASCII character '\xe2' in file runnit2.py on line 345, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details`. — maudulus, Jul 06 '18 at 21:04
I added the `# -*- coding: utf-8 -*-` at the top, and get error: `Traceback (most recent call last):File "example.py", line 977, in allTags = "\"" + ','.join(x["tags"]) + "\""TypeError: sequence item 3: expected string or Unicode, NoneType found` — maudulus, Jul 06 '18 at 21:08
@maudulus The data you posted in codeshare link is definitely invalid JSON. — Salman A, Jul 08 '18 at 19:32
I removed changed wb+ to w+ in python3 and that worked error free. — Rohit Salunke, Jul 13 '18 at 18:35

score 3 · Accepted Answer · answered Jul 10 '18 at 08:56

From your posted sample data, I assume that the 1st index of the posted json has a null in the 3rd index of the values of tag key. i.e: tag7

"tags": [
          "tag4",
          "tag5",
          "tag6",
          "tag7",
          "tag8"
        ],

To get rid of the TypeError that raises due to nulls you can simply check and replace the nulls if they exist as shown below.

x["tags"] = ["" if i is None else i for i in x["tags"]]
allTags = "\"" + ','.join(x["tags"]) + "\""

I have assigned an empty string to replace nulls.

Alternatively you can remove all the false elements by using None in the filter() function.

allTags = "\"" + ','.join(filter(None, x["tags"])) + "\""

NOTE: Add r"[...]" and fix the indentation issue in the for loop.

score 1 · Answer 2 · answered Jul 11 '18 at 19:21

Use raw string and set file encoding to utf-8 in normal (non-binary mode) mode when opening. For Python 3.6 it will be enough.

On Python 2.7 you should use codecs.open('example.csv', 'w', encoding='utf-8') instead of regular open() when dealing with unicode content. Also, csv module on Python 2.7 does not support unicode out of the box, so I suggest switching to unicodecsv or following the guidelines in this answer.

score 0 · Answer 3 · answered Jul 09 '18 at 09:26

Modify reading and writing using W If you must use WB, use the following functions. You need to add r in front of all texts to handle special symbols.

import csv
import json

x = r"""[
      {
        "brief": "Brief 1",
        "description": "Description 1",
        "photos": [
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010"
        ],
        "price": "145",
        "tags": [
          "tag1",
          "tag2",
          "tag3"
        ],
        "title": "Title 1"
      },
      {
        "brief": "Brief 2",
        "description": "Description 2",
        "photos": [
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010"
        ],
        "price": "150",
        "tags": [
          "tag4",
          "tag5",
          "tag6",
          "tag7",
          "tag8"
        ],
        "title": "Title 2"
      },{
        "brief": "blah blah 5'0\" to 5'4\"",
        "buyerPickup": true,
        "condition": "Good",
        "coverShipping": false,    
        "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n  \r\n\r\n",
        "photos": [
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111"
        ],
        "price": "240",
        "tags": [
          "tag2",
          "5'0\"-5'4\""
        ],
        "title": "blah blah 17\" Frame",
        "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111"    
      } 
    ]"""

x = json.loads(x)


def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value


def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str

    return value


f = csv.writer(open("example.csv", "w+"))
writeList = ["Handle", "Title", "Body (HTML)", "Vendor", "Type", "Tags", "Published", "Option1 Name", "Option1 Value",
             "Variant Inventory Qty", "Variant Inventory Policy", "Variant Fulfillment Service", "Variant Price",
             "Variant Requires Shipping", "Variant Taxable", "Image Src"]
newList = []
for item in writeList:
    newList.append(to_bytes(item))

f.writerow(newList)

for x in x:

    allTags = r"\"" + ','.join(x["tags"]) + r"\""
    images = x["photos"]
    f.writerow([x["title"],
                x["title"],
                x["description"],
                "Vendor Name",
                "Widget",
                allTags,
                "TRUE",
                "Title",
                "Default Title",
                "1",
                "deny",
                "manual",
                x["price"],
                "TRUE",
                "TRUE",
                images.pop(0) if images else None])
    while images:
        f.writerow([x["title"], None, None, None, None, None, None, None, None, None, None, None, None, None, None,
                    images.pop(0)])

This does not address the issue pointed out in the question which is the TypeError — Marlon Abeykoon, Jul 10 '18 at 08:58

score 0 · Answer 4 · answered Jul 15 '18 at 17:36

Possible duplicate of this question how to convert characters like \x22 into string

On cleaning the code the error boils down to

import json

x = '''
  {
    "brief": "\""
  }'''

x = json.loads(x)

Consider replacing \" with \u201d

import json

x = '{"brief": "\u201d"}'

x = json.loads(x)

Valid JSON Load in Python file

4 Answers4