2

This is Json data:

 {
    "maxPage" : 145,
    "previous_cursor" : null,
    "next_cursor": 1420,
    "data": {
        "2427459624": {
            "nick": "\u5c0f\u767d\u6843\u82b1\u773cGy",
            "fans": 565,
            "vip": 0,
            "avantar": "http: \/\/tp1.sinaimg.cn\/2427459624\/30\/5614847484\/0",
            "ta": "\u5979",
            "relation": 0,
            "canMsg": 0,
            "vipReason": "",
            "description": "\u5f88\u591a\u65f6\u5019\u4e36\u535f\u53bb\u8bf4\u4e36\u535f\u53bb\u505a\u4e36\u535f\u53bb\u60f3\u4e36\u535f\u4ee3\u8868\u535f\u5728\u4e4e\u3002",
            "location": "\u9ed1\u9f99\u6c5f \u7261\u4e39\u6c5f",
            "text": "@\u975e\u9c7c-CC \u6211\u56de\u8d60\u4e86\u6e29\u99a8\u793c\u76d2\u7ed9\u4f60\u4eec\u3002\u4e00\u8d77\u6765\u73a9\u5fae\u57ce\u5e02\u5427\uff01\u5f00\u59cb\u6e38\u620fhttp: \/\/t.cn\/ak39KS",
            "textTime": "\u4eca\u5929 13: 30",
            "distance": ""
        },
        "2574743404": {
            "nick": "\u798f\u5efa\u65f6\u5c1a\u751f\u6d3b",
            "fans": 52,
            "vip": 0,
            "avantar": "http: \/\/tp1.sinaimg.cn\/2574743404\/30\/5618976622\/0",
            "ta": "\u5979",
            "relation": 0,
            "canMsg": 0,
            "vipReason": "",
            "description": "\u798f\u5efa\u65f6\u5c1a\u751f\u6d3b\u7cbe\u5f69\u8d44\u8baf\u63a8\u8350",
            "location": "\u798f\u5efa",
            "text": "\u5206\u4eab\u76f8\u518c\uff1aJil Sander 2012\u6625\u590f\u6d41\u884c\u53d1\u5e03 \uff08\u914d\u9970\uff09\u3001\u7cbe\u5f69\u56fe\u7247\u63a8\u8350\uff1aJil Sander 2012\u6625\u590f\u6d41\u884c\u53d1\u5e03 \uff08\u914d\u9970\uff09 [78]\uff0809\u670826\u65e5\u4e0a\u4f20\uff09\u3001\u6d4f\u89c8\u5168\u90e884\u5f20\u8d85\u9ad8\u6e05\u5927\u56fe  http: \/\/t.cn\/ScMjKe  \uff08\u5206\u4eab\u81ea @\u7acb\u4f53\u4eba\u751f\u7f51\u7ad9\uff09",
            "textTime": "2011-12-16 12: 27: 15",
            "distance": ""
        },
        "2278308024": {
            "nick": "\u65e0\u654cde\u5e2d\u5c0f\u82b1\u732b",
            "fans": 158,
            "vip": 0,
            "avantar": "http: \/\/tp1.sinaimg.cn\/2278308024\/30\/5609016681\/0",
            "ta": "\u5979",
            "relation": 0,
            "canMsg": 0,
            "vipReason": "",
            "description": "",
            "location": "\u56db\u5ddd \u6210\u90fd",
            "text": "\u4eba\u751f\u6709\u4e24\u79cd\u5883\u754c\uff1a\u4e00\u662f\u75db\u800c\u4e0d\u8a00\uff0c\u4e8c\u662f\u7b11\u800c\u4e0d\u8bed\u3002\u75db\u800c\u4e0d\u8a00\u662f\u4e00\u79cd\u667a\u6167\uff0c\u4eba\u751f\u5728\u4e16\uff0c\u5f80\u5f80\u4f1a\u56e0\u8fd9\u6837\u6216\u90a3\u6837\u7684\u4f24\u5bb3\u800c\u5fc3\u75db\u4e0d\u5df2\u3002\u5bf9\u575a\u5f3a\u7684\u4eba\u6765\u8bf4\uff0c\u7d2f\u7d2f\u4f24\u75d5\u662f\u751f\u547d\u8d50\u4e88\u7684\u6700\u597d\u793c\u7269\u3002\u7b11\u800c\u4e0d\u8bed\u662f\u4e00\u79cd\u8c41\u8fbe\uff0c\u670b\u53cb\u95f4\u7684\u620f\u8650\uff0c\u906d\u4eba\u8bef\u89e3\u540e\u7684\u65e0\u5948\uff0c\u8fc7\u591a\u7684\u8a00\u8f9e\u7533\u8fa9\u53cd\u8ba9\u4eba\u89c9\u5f97\u534e\u800c\u4e0d\u5b9e\uff0c\u83ab\u4e0d\u5982\u7559\u4e0b\u4e00\u62b9\u5fae\u7b11\uff0c\u4efb\u4ed6\u4eba\u4f5c\u8bc4\u3002",
            "textTime": "02\u670812\u65e5 20: 57",
            "distance": ""
        },
        "2264791490": {
            "nick": "\u90b1\u971e\u98de\u81ed\u7f8e\u599e",
            "fans": 169,
            "vip": 0,
            "avantar": "http: \/\/tp3.sinaimg.cn\/2264791490\/30\/5609016690\/0",
            "ta": "\u5979",
            "relation": 0,
            "canMsg": 0,
            "vipReason": "",
            "description": "",
            "location": "\u91cd\u5e86 \u53cc\u6865\u533a",
            "text": "[\u5a01\u6b66]",
            "textTime": "\u4eca\u5929 14: 32",
            "distance": ""
        },
        "2469397011": {
            "nick": "\u9633\u5149\u7537\u5b69-\u5f20\u552f",
            "fans": 1356,
            "vip": 0,
            "avantar": "http: \/\/tp4.sinaimg.cn\/2469397011\/30\/5616484523\/1",
            "ta": "\u4ed6",
            "relation": 0,
            "canMsg": 0,
            "vipReason": "",
            "description": "",
            "location": "\u4e0a\u6d77 \u5f90\u6c47\u533a",
            "text": "@\u5409\u7c73is\u963f\u8bb8 \u7684\u5fae\u7fa4 \"\u63a2\u5e97\u4e4b\u65c5\u5bfb\u627e\u9876\u7ea7\u7f8e\u98df\" \u633a\u4e0d\u9519\u7684 http: \/\/t.cn\/S70CwE \u63a8\u8350\u5927\u5bb6\u4e5f\u6765\u770b\u770b~ @\u6625\u5a07\u4e0e\u5fd7\u660e\u8bed\u5f55 @\u5b89\u59ae\u9759\u513f\u5e78\u798f @richesse-du-vin @\u8c22\u701a\u6dd8 @\u5e38\u7389\u7126 @Mr\u918b @Snake_\u5b5f\u5f37 @\u8c22\u4f1f- @\u627f\u5fb7\u9b4f\u8273\u541b @\u798f\u53f0\u4e0a\u6d77\u9910\u996e\u7ba1\u7406\u6709\u9650\u516c\u53f8",
            "textTime": "02\u670813\u65e5 22: 57",
            "distance": ""
        },
        "2533141051": {
            "nick": "80\u540e\u8d2d\u7269\u72c2--",
            "fans": 351,
            "vip": 0,
            "avantar": "http: \/\/tp4.sinaimg.cn\/2533141051\/30\/5623765150\/0",
            "ta": "\u5979",
            "relation": 0,
            "canMsg": 0,
            "vipReason": "",
            "description": "\u6dd8\u5b9d\u4e4b\u524d\u767b\u5f55www.ejiamen.com\uff0c\u7136\u540e\u53bb\u4e70\uff0c\u80fd\u8fd4\u5229\u4e0d\u5c11\uff0c\u6536\u85cf\u597d\u5730\u5740\u5427\uff0c\u5206\u4eab\u4e0b\uff01",
            "location": "\u5317\u4eac",
            "text": "\u301010\u9053\u9898\u63ed\u79d8\u4f60\u662f\u4ec0\u4e48\u6837\u7684\u4eba\u3011http: \/\/t.cn\/zOwG4wS \u8fd9\u4e2a\u6d4b\u8bd5\u662f\u83f2\u5c14\u535a\u58eb\u5728\u8457\u540d\u5973\u9ed1\u4eba\u6b27\u666e\u62c9\u7684\u8282\u76ee\u91cc\u505a\u7684\uff0c\u6ee1\u51c6\u786e\u7684\u3002\u7b54\u590d\u662f\u4f9d\u73b0\u5728\u7684\u60a8\uff0c\u4e0d\u8981\u4f9d\u8fc7\u53bb\u7684\u60a8\u3002\u8fd9\u662f\u4e00\u4e2a\u76ee\u524d\u5f88\u591a\u5927\u516c\u53f8\u4eba\u4e8b\u90e8\u95e8\u5b9e\u9645\u91c7\u7528\u7684\u6d4b\u8bd5\u3002",
            "textTime": "02\u670811\u65e5 15: 30",
            "distance": ""
        },
        "2439491632": {
            "nick": "\u5feb\u4e50\u535a\u5f69",
            "fans": 571,
            "vip": 0,
            "avantar": "http: \/\/tp1.sinaimg.cn\/2439491632\/30\/5613095351\/1",
            "ta": "\u4ed6",
            "relation": 0,
            "canMsg": 0,
            "vipReason": "",
            "description": "",
            "location": "\u4e0a\u6d77 \u6d66\u4e1c\u65b0\u533a",
            "text": "\u521a\u521a\u5728#\u5fb7\u514b\u8428\u65af\u6251\u514b#\u5347\u7ea7\u4e86\uff0c\u76ee\u524d\u7684\u7b49\u7ea7\u662f\"\u4e9a\u6d32\u767e\u4e07\u5bcc\u7fc1\u2605100W\"\uff0c \u60f3\u4f53\u9a8c\u5417\uff1f\u5feb\u6765\u52a0\u5165\u5427\uff01\u5f00\u59cb\u6e38\u620fhttp: \/\/t.cn\/Szyr0q",
            "textTime": "\u4eca\u5929 00: 02",
            "distance": ""
        },
        "2179340374": {
            "nick": "\u5c18\u604bYY",
            "fans": 711,
            "vip": 0,
            "avantar": "http: \/\/tp3.sinaimg.cn\/2179340374\/30\/5603281798\/0",
            "ta": "\u5979",
            "relation": 0,
            "canMsg": 0,
            "vipReason": "",
            "description": "\u6d3b\u6cfc\uff0c\u5f00\u6717\u7684\u5973\u5b69\uff0c\u559c\u6b22\u7ed3\u4ea4\u670b\u53cb\uff0c\u559c\u6b22\u53eb\u670b\u53cb\u7684\u5c31\u52a0\u6211\u5427\uff01",
            "location": "\u6c5f\u82cf \u5357\u4eac",
            "text": "\u3010<\u6dd1\u5973\u8863\u6a71>\u3011 \http: \/\/t.cn\/SLNCec",
            "textTime": "02\u670815\u65e5 09: 29",
            "distance": ""
        },
        "2572701804": {
            "nick": "\u673a\u573a\u7eff\u8272\u901a\u9053",
            "fans": 2607,
            "vip": 0,
            "avantar": "http: \/\/tp1.sinaimg.cn\/2572701804\/30\/5617561075\/1",
            "ta": "\u4ed6",
            "relation": 0,
            "canMsg": 0,
            "vipReason": "",
            "description": "@\u673a\u573a\u7eff\u8272\u901a\u9053 \u5b98\u65b9\u5fae\u535a\u3002\u514d\u8d39\u63d0\u4f9b\u4ee3\u6362\u767b\u673a\u724c\uff0c\u8981\u5ba2\u901a\u9053\u7b49\u670d\u52a1\u3002APP\u5723\u8bde\u8282\u4e0a\u7ebf\uff0c\u656c\u8bf7\u5173\u6ce8^_^",
            "location": "\u4e0a\u6d77",
            "text": "\u4ee5\u540e\u4f7f\u7528\u6211\u4eec\u7684app\u5e94\u7528\u5c31\u4e0d\u4f1a\u8bef\u673a\uff0c\u5e2e\u4f60\u4ee3\u6362\u767b\u673a\u724c\uff0c\u53ea\u9700\u63d0\u524d25\u5206\u949f\u5230\u673a\u573a\u5373\u53ef\u3002",
            "textTime": "2011-12-12 17: 40: 21",
            "distance": ""}
        },
    "ok": 1,
    "msg": "\u6210\u5458\u5217\u8868\u83b7\u53d6\u6210\u529f"
}

Error Message is: Invalid \escape: line 1 column 5863 (char 5863)

Why?

I asked a similar question Python json.loads fails with `ValueError: Invalid control character at: line 1 column 33 (char 33)` , and got the right answer, but that was part of original json data, this is the entire json data. Hope to find a general solution.

Community
  • 1
  • 1
福气鱼
  • 1,189
  • 2
  • 10
  • 13

2 Answers2

11

The problematic part is here:

"text":"\u3010<\u6dd1\u5973\u8863\u6a71>\u3011 \http:\/\/t.cn\/SLNCec"
                                               ^-- character 5863

\h is not a valid escape sequence. Where does this invalid piece of data come from?

In your sample string, the backslash occurs in three "valid" situations:

  • before a /
  • as the start of a Unicode escape sequence
  • before a " (to escape an embedded quote).

You can convert all other backslashes to double backslashes like this:

import re
regex = re.compile(r'\\(?![/u"])')
fixed = regex.sub(r"\\\\", original)
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • If I don't know if there are any other such stupid situation, How can I avoid this problem. Since the data is come from an api. I want to know if I can escape or do something to change it. – 福气鱼 Feb 16 '12 at 13:46
  • I'm sorry, This method is not correct, It still gave "ValueError: Expecting , delimiter: line 1 column 3315 (char 3315)" message – 福气鱼 Feb 16 '12 at 14:32
  • @user627670: There was an error in my `re.sub()` call which I have now fixed. However, I'm also getting this error. It appears when there is an escaped quote in the string. This puzzles me since that's the correct way to handle embedded quotes in JSON. Will have to look into this a bit more. – Tim Pietzcker Feb 16 '12 at 14:50
  • @user627670: OK, I think I see the problem. When copying your string into a Python script for testing, the escaped quotes are ignored by Python. Try `s = '\"'` and then `repr(s)` - the backslash is gone. But that only happens in this situation; it shouldn't be a problem when reading the string from a file. – Tim Pietzcker Feb 16 '12 at 15:00
  • the original error message is : Invalid \escape: line 1 column 5863 (char 5863) the original json data returned by urllib's open().read() method, and the above json data comes from my log file, so i don't know if python did something when i wrote the original data into a file. – 福气鱼 Feb 16 '12 at 15:09
1

The parts that caused problem when I tried to loads json from your string are:
\" : json.decoder.JSONDecodeError: Expecting ',' delimiter:
\h : json.decoder.JSONDecodeError: Invalid \escape:
You need to change it to:
\\\" and \\\h respectively so that \\\h becomes \\h in the string and becomes \h after parsing

Kallzvx
  • 594
  • 7
  • 23