2

Is there any way to parse Python list in PHP?

I have data coming from Python stored in MySQL, something like this:

[{u'hello: u'world'}]

And need to use it in PHP script. The data is a valid JSON, only difference are those leading u'

So I can replace all u' with ' and then replace all ' with " to get it into json. When I replace everything, if there is ' in the actual value, it is replaced by " as well and breaks the json.

So.. I tried a lot of stuff, but none of them was able to parse proper json thus my question: Is there any way to parse Python generated list/json-like data in PHP? I don't mind using some third-party library or etc, just want to get the data parsed...

halfer
  • 19,824
  • 17
  • 99
  • 186
Tomas
  • 2,676
  • 5
  • 41
  • 51
  • 11
    Ideally, you should fix that Python program so that it's exporting actual JSON, using `json.dumps()`, rather than simply printing the `repr()` of the Python data structure. – PM 2Ring Feb 03 '16 at 13:04
  • By `u"` you mean `u''`, don't you? – felipsmartins Feb 03 '16 at 13:04
  • @felipsmartins: If the string contains a single quote then its Python representation will use double-quotes instead of single-quotes. – PM 2Ring Feb 03 '16 at 13:05
  • Reviewer http://stackoverflow.com/a/34479722/2153237 – Jose Carlos Ramos Carmenates Feb 03 '16 at 13:06
  • 3
    The proper way is to store correctly serialized data. So, when you write data to mysql add serialization step: json.dumps(your_list_of_dicts). In this case you don't need any workarounds to parse not formalized strings anywhere. – Aleksandr K. Feb 03 '16 at 13:07
  • I understad all comments above, but sice the python code is out of my hand, is there any way to parse it in php? – Tomas Feb 03 '16 at 13:10
  • @PM2Ring OP said there's `u"` but it is not true since it causes syntax error. `u"` is not equals `u''`, `u""` – felipsmartins Feb 03 '16 at 13:10
  • @Tom Can you post the code having all use cases? SO I can to help. – felipsmartins Feb 03 '16 at 13:13
  • 2
    Why is it out of your hands? No external API could possibly be returning data in this format, which means the Python script must be something you or a colleague has written. You or they should fix it. – Daniel Roseman Feb 03 '16 at 13:16
  • @felipsmartins basically: `[[{u'foo:u'bar, u'hello:u'I don't know'}]]` if it handles that, it should solve my problem. Also, if I could fix json to replace `'` with `"` ONLY around parameters and values but not in values that would do it – Tomas Feb 03 '16 at 13:16
  • @DanielRoseman I am getting the data from database that is fed from third-party system... – Tomas Feb 03 '16 at 13:17
  • 2
    Don't you have access to Python so you could use it to parse the data and then transform it into something PHP understands? – deceze Feb 03 '16 at 13:18
  • @deceze I suppose I could use command line somehow, but I was hoping for more elegant solution – Tomas Feb 03 '16 at 13:19
  • 1
    How about if your PHP code calls a Python script to convert the data into legal JSON? – PM 2Ring Feb 03 '16 at 13:19
  • 3
    Quite honestly, putting it through a small Python script *is* the most elegant way IMO. Otherwise you'd have to replicate the Python string literal parser in PHP, which I would not attempt unless absolutely necessary. – deceze Feb 03 '16 at 13:21
  • @deceze Can you give an example of that? – Tomas Feb 03 '16 at 13:24
  • I guess there's no point running a Python program over your database to fix all the broken JSON, since that 3rd party system's just going to pollute it with more broken stuff in the future. – PM 2Ring Feb 03 '16 at 13:25
  • 2
    You ***really*** ought to report this as a bug to the third party...! – deceze Feb 03 '16 at 13:25
  • 1
    @Tom `[[{u'foo:u'bar, u'hello:u'I don't know'}]]` is an invalid structure. – felipsmartins Feb 03 '16 at 13:25
  • Right, gonna use command line for now, and push them to change that. Unfortunately I am not the only database they are polluting, only happen to be the only one that doesnt use python on my end.. Thanks everyone – Tomas Feb 03 '16 at 13:30
  • If you really have strings in the database that look like `[[{u'foo:u'bar, u'hello:u'I don't know'}]]` then things are even worse than we thought. Hopefully, it's actually more like `[[{u'foo': u'bar', u'hello': u"I don't know"}]]`, which _is_ legal Python. – PM 2Ring Feb 03 '16 at 13:32
  • Is the data inside this broken JSON just plain ASCII, or does it contain fancy stuff like accented characters? By default, `json.dumps` produces UTF-8 encoded output, but there are ways of dealing with that if you want plain ASCII. – PM 2Ring Feb 03 '16 at 13:37
  • @PM2Ring Sorry my typo, the strings are legal python. The string can be virtually anything – Tomas Feb 03 '16 at 15:37

1 Answers1

4

If you have access to python, you can convert it to json from the command line. Here's an example.

$ echo "{u'key': u'value'}" |\
  python -c "import sys, json, ast; print(json.dumps(ast.literal_eval(sys.stdin.read())))"

{"key": "value"}

Here's a better formatted version of the python oneliner:

import sys, json, ast
data = ast.literal_eval(sys.stdin.read())
print(json.dumps(data))

By using ast.literal_eval instead of regular eval we can evaluate the python dictionary literal and not worry about potential code execution vulnerabilities.

Håken Lid
  • 22,318
  • 9
  • 52
  • 67
  • 3
    `eval` can be [dangerous](http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html) to use on arbitrary data. It's _far_ better to use [ast.literal_eval](https://docs.python.org/2/library/ast.html#ast.literal_eval) for something like this. – PM 2Ring Feb 03 '16 at 13:27
  • I've changed the answer to include your suggestion. – Håken Lid Feb 03 '16 at 13:29
  • Per http://stackoverflow.com/questions/9949533/python-eval-vs-ast-literal-eval-vs-json-decode, why not just use json.loads instead of ast.literal_eval ? – parkamark Feb 03 '16 at 13:45
  • 2
    @parkamark Because the input is a python literal, not json. – Håken Lid Feb 03 '16 at 13:49