0

I have a workflow where output from one process is input to the next.

Process A outputs a JSON.

Process B inputs needs to be a JSON.

However, since I pass the JSON as a command-line argument, it becomes a string.

This command below is not in my control. It is autogenerated by Nextflow and so I need to find a solution (need not be JSON) but I need to access these values (keeping in mind this is essentially just a string)

python3.7 typing.py '{'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}'

typing.py

def download_this(task_as_string):
    print("Normal")
    print(task_as_string)

    first = json.dumps(task_as_string)
    print("after json.dumps")
    print(first)

    second = json.loads(first)
    print("after json.loads")
    print(second)
    print(type(second))

if __name__ == "__main__":
    download_this(sys.argv[1])

I thought doing a json.dumps and then a json.loads would make it work, but it does not work.

Output

Normal
{id: 3283, code: 1234, task: 66128b3b-3440-4f71-9a6b-c788bc9f5d2c}
after json.dumps
"{id: 3283, code: 1234, task: 66128b3b-3440-4f71-9a6b-c788bc9f5d2c}"
after json.loads
{id: 3283, code: 1234, task: 66128b3b-3440-4f71-9a6b-c788bc9f5d2c}
<class 'str'>

And if I do print(second["task"]) I get a string indices must be integers

Traceback (most recent call last):
  File "typing.py", line 78, in <module>
    download_this(sys.argv[1])
  File "typing.py", line 55, in download_typed_hla
    print(second["task"])    
TypeError: string indices must be integers

So it was never converted to a dict in the first place. Any ideas how I can get around this problem?

Community
  • 1
  • 1
DUDANF
  • 2,618
  • 1
  • 12
  • 42
  • Looks like you have things the wrong way round. If the item is a string, you need `json.loads`, not `json.dumps`. – Daniel Roseman Oct 18 '19 at 14:29
  • Not quite, because `{id: 3283, code: 1234, task: 66128b3b-3440-4f71-9a6b-c788bc9f5d2c}` JSON needs strings to be in double quotes. Since there are no double quotes, it doesn't recognise it as a dict. – DUDANF Oct 18 '19 at 14:30
  • How about the thing discussed here: https://stackoverflow.com/q/34812821/4636715 – vahdet Oct 18 '19 at 14:32
  • Wat? Your `task_as_string` is a string. You dump it to a string-in-a-string. You then load it again to be just-a-string. It's never a dict. You're not passing valid JSON in to start with, so you can't handle it as JSON. Dumping it to JSON doesn't improve that situation. – deceze Oct 18 '19 at 14:33
  • JSON is always a string. `json.loads` turns a string into a Python object (this includes turning a string like `'"foo"'` in the Python `str` object `'foo'`). – chepner Oct 18 '19 at 14:38
  • Yes guys, this is the problem I have. I do not type `python3.7 typing.py '{'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}'` by myself. This is generated by Nextflow, based on my outputing my json to stdout. I don't even need to use it as a JSON, I just need to access the values. How could we make that happen? Keeping in mind it is passed in as a string that looks like a JSON without double quotes. – DUDANF Oct 18 '19 at 14:51
  • So, you have something that's *not JSON*. That's bad for starters. You should figure out what format that's *supposed* to be if it's not JSON. It looks like it might be parsable by `ast.literal_eval`. Secondly, the value isn't correctly escaped when passed as command line argument, so you're even losing the single quotes within the string (because they're interpreted by the shell because they're not escaped). – deceze Oct 18 '19 at 15:02
  • A good work-around would be to save the JSON to the file instead of passing the JSON as a String. But that's excatly why I've posted it to SO. To get more ideas and different views. – DUDANF Oct 18 '19 at 15:05

1 Answers1

3

A couple things:

  1. Your JSON is not properly formatted. Keys and values need to be enclosed by double quotes.
  2. You are passing in a stringified version of the JSON. Then you stringify it further before trying to load it. Just load it directly.
def download_this(task_as_string):
    print("Normal")
    print(task_as_string)

    second = json.loads(task_as_string)
    print("after json.loads")
    print(second)
    print(type(second))

download_this('{"id": "3283", "code": "1234", "task": "66128b3b-3440-4f71-9a6b-c788bc9f5d2c"}')

Normal
{"id": "3283", "code": "1234", "task": "66128b3b-3440-4f71-9a6b-c788bc9f5d2c"}
after json.loads
{'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}
<class 'dict'>

To get around your input problem, provided that you trust the input from Nextflow to conform to a simple dictionary-like structure, you could do something like this:

d = dict()
for group in task_as_string.replace('{', '').replace('}', '').split(','):
    l = group.split(':')
    d[l[0].strip()] = l[1].strip()

print(d)
print(type(d))
python3 typing.py '{'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}'                      [12:03:11]
{'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}
<class 'dict'>

If the JSON coming from Nextflow is more complicated (i.e. with nesting and/or lists), then you'll have to come up with a more suitable parsing mechanism.

sloppypasta
  • 1,068
  • 9
  • 15
  • yes, the workflow I'm using "Nextflow" handles the input/output. So Nextflow inputs it the way I've shown, without quotes. So I need to figure out a way to access the data from `python3.7 typing.py '{'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}'` – DUDANF Oct 18 '19 at 14:49
  • @daudnadeem I updated the answer to parse the string into a dict. – sloppypasta Oct 18 '19 at 16:05