2

I want to read python dictionary string using java. Example string:

{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}

This is not a valid JSON. I want it to convert into proper JSON using java code.

GhostCat
  • 137,827
  • 25
  • 176
  • 248
Devavrata
  • 1,785
  • 17
  • 30
  • 5
    Interesting assignment. And what is your question? And I agree with the following comment: why spend energy to parse a non-standard format; instead of making sure you emit JSON on the python side instead?! – GhostCat Apr 26 '17 at 12:33
  • As this is not a proper JSON so I am not able to load it in JAVA. Basically I am using SCALA and json4s library. – Devavrata Apr 26 '17 at 12:34
  • @GhostCat It is not possible in my case. These strings are saved in DB – Devavrata Apr 26 '17 at 12:34
  • Would probably be easiest to make sure it's in JSON before it leaves Python. Rather than trying to make Java understand Python's string representation of a dictionary – Cruncher Apr 26 '17 at 12:35
  • 1
    @Devarata then convert them to JSON as they get into the database. Saving non standard formats into a db spells trouble – Cruncher Apr 26 '17 at 12:36
  • And worst case: couldn't you have a python bridge "near" to your database? Meaning: let python either pull the data; or pull the string; push it in python; and fetch JSON from python. – GhostCat Apr 26 '17 at 12:37
  • @Cruncher True. But is there any way to do it. Actually this one is legacy and making change in DB(prod.) is a big task. – Devavrata Apr 26 '17 at 12:38
  • @Devavrata of course there's a way. If python can parse it, so can Java. You just have to do some work, and if no one has written the parser in Java yet, and you have to do it yourself, it's very prone to missing edge cases – Cruncher Apr 26 '17 at 12:42
  • As ghostcat says, you can make a Python service near to the database for the sole purpose of interpreting these values. – Cruncher Apr 26 '17 at 12:44
  • 2
    Perhaps you should use Jython to allow you to pass values to a python interpreter within Java and let it return that JSON to you. – RealSkeptic Apr 26 '17 at 12:45
  • @RealSkeptic One minute to late ... jython was already written down ;-) – GhostCat Apr 26 '17 at 12:48

3 Answers3

5

well, the best way would be to pass it through a python script that reads that data and outputs valid json:

>>> json.dumps(ast.literal_eval("{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}"))
'{"name": "Shivam", "otherInfo": [[0], [1]], "isMale": true}'

so you could create a script that only contains:

import json, ast; print(json.dumps(ast.literal_eval(sys.argv[1])))

then you can make it a python oneliner like so:

python -c "import sys, ast, json ; print(json.dumps(ast.literal_eval(sys.argv[1])))" "{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}"

that you can run from your shell, meaning you can run it from within java the same way:

String PythonData = "{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}";

String[] cmd = {
    "python", "-c", "import sys, ast, json ; print(json.dumps(ast.literal_eval(sys.argv[1])))",
    python_data
    };
Runtime.getRuntime().exec(cmd);

and as output you'll have a proper JSON string.

This solution is the most reliable way I can think of, as it's going to parse safely any python syntax without issue (as it's using the python parser to do so), without opening a window for code injection.

But I wouldn't recommend using it, because you'd be spawning a python process for each string you parse, which would be a performance killer.

As an improvement on top of that first answer, you could use some jython to run that python code in the JVM for a bit more performance.

PythonInterpreter interpreter = new PythonInterpreter();
interpreter.eval("to_json = lambda d: json.dumps(ast.literal_eval(d))")
PyObject ToJson = interpreter.get("to_json");
PyObject result = ToJson.__call__(new PyString(PythonData));
String realResult = (String) result.__tojava__(String.class);

The above is untested (so it's likely to fail and spawn dragons ) and I'm pretty sure you can make it more elegant. It's loosely adapted from this answer. I'll leave up to you as an exercise to see how you can include the jython environment in your Java runtime ☺.


P.S.: Another solution would be to try and fix every pattern you can think of using a gigantic regexp or multiple ones. But even if on simpler cases that might work, I would advise against that, because regex is the wrong tool for the job, as it won't be expressive enough and you'll never be comprehensive. It's only a good way to plant a seed for a bug that'll kill you at some point in the future.


P.S.2: Whenever you need to parse code from an external source, always make sure that data is sanitized and safe. Never forget about little bobby tables

Community
  • 1
  • 1
zmo
  • 24,463
  • 4
  • 54
  • 90
  • This makes a lot of sense actually – Cruncher Apr 26 '17 at 12:45
  • Nice and straight forward solution ... and I think together with my suggestions, it becomes even more interesting. Any feedback is welcome ... – GhostCat Apr 26 '17 at 12:46
  • Though I would be cautious about taking data from a database and shoving it into an exec... – Cruncher Apr 26 '17 at 12:47
  • @cruncher I've been thinking of way to circumvent it, but because that code does an `exec` of python, the code ran is exactly the one liner as being written above, and the variable is passed as an `argv` argument to the [`literal_eval` function](http://stackoverflow.com/questions/15197673/using-pythons-eval-vs-ast-literal-eval), this code is pretty safe against usual exploits. – zmo Apr 26 '17 at 12:49
  • You're probably right. Since you're using args and not just shoving it in with string concatenation. Just saying you always need to be cautious. Easy to mess something like this up you know? – Cruncher Apr 26 '17 at 12:53
  • Will jython start python interpreter or it will be act as another Java library. I am concerned about its performance because I am going to parse nearly 10 millions JSON. – Devavrata Apr 26 '17 at 14:16
  • 1
    well, it's a python interpreter that runs in the JVM. So the upside is that you can reuse the interpreter instance and avoid the cost of spawning it for each of the 10M strings. The downside is that it's still a huge overhead. You'd better connect to the database using python, create a new field next to the one you've got, and for each python dict row build the JSON and store it in the json row. If that's an operation you only run once over the whole database, it'll be the most efficient. – zmo Apr 26 '17 at 14:27
  • Hmm correct, I think its not even thread safe also. I need to get an interpreter per thread that I will see. Thanks for ur answer. I was expecting some lightweight library for this :) – Devavrata Apr 26 '17 at 14:42
  • @zmo Is single PythonInterpreter object can be called from multiple threads? – Devavrata Apr 28 '17 at 14:53
  • I don't know, check the documentation. But when in doubt, consider it's not thread safe. – zmo Apr 28 '17 at 14:57
1

In conjunction to the other answer: it is straight forward to simply invoke that python one-liner statement to "translate" a python-dict-string into a standard JSON string.

But doing a new Process for each row in your database might turn into a performance killer quickly.

Thus there are two options that you should consider on top of that:

  • establish some small "python server" that keeps running; its only job is to do that translation for JVMs that can connect to it
  • you can look into jython. Meaning: simply enable your JVM to run python code. In other words: instead of writing your own python-dict-string parser; you simply add "python powers" to your JVM; and rely on existing components to that translation for you.
GhostCat
  • 137,827
  • 25
  • 176
  • 248
0

Hacky solution

Do a string replace ' -> ", True -> true, False -> false, and None -> null, then parse the result as Json. If you are lucky (and are willing to bet on remaining lucky in the future), this can actually work in practice.

See rh-messaging/cli-rhea/blob/main/lib/formatter.js#L240-L249 (in Javascript)

static replaceWithPythonType(strMessage) {
    return strMessage.replace(/null/g, 'None').replace(/true/g, 'True').replace(/false/g, 'False').replace(/undefined/g, 'None').replace(/\{\}/g, 'None');
}

Skylark solution

Skylark is a subset (data-definition) language based on Python. There are parsers in Go, Java, Rust, C, and Lua listed on the project's page. The problem is that the Java artifacts aren't published anywhere, as discussed in Q: How do I include a Skylark configuration parser in my application?

Graal Python

Possibly this, https://github.com/oracle/graalpython/issues/96#issuecomment-1662566214

DIY Parsers

I was not able to find a parser specific to the Python literal notation. The ANTLR samples contain a Python grammar that could plausibly be cut down to work for you https://github.com/antlr/grammars-v4/tree/master/python/python3

user7610
  • 25,267
  • 15
  • 124
  • 150