0

I have a string (from an API call) that looks something like this:

val=

{input:a,matches:[{in:["w","x","y","z"],output:{num1:0d-2,num2:7.0d-1}},
{in:["w","x"],output:{num1:0d-2,num2:8.0d-1}}]}

I need to do temp=json.loads(val); but the problem is that the string is not a valid JSON. The keys and values do not have the quotes around them. I tried explicitly putting the quotes and that worked.

How can I programatically include the quotes for such a string before reading it as a JSON?

Also, how can I replace the numbers scientific notations with decimals? eg. 0d-2 becomes "0" and 8.0d-1 becomes "0.8"?

  • You can, but it is not reliable. The best way forward is to contact the provider of this API. They should follow the JSON standards. BTW that "scientific notation" looks weird. Is the -2 about the exponent or about the mantissa? In standard scientific notation 0E-2 would still be 0. – trincot Feb 07 '22 at 11:33
  • it looks "something like this" or "exactly like this"? Is "val=" part of the string or `val` is the variable containing the string? And if it's a string why it's not between quotes? –  Feb 07 '22 at 11:40
  • Yes I think "0d-2" is "0" only, but "7.0d-1" implies "0.7" and so on... Regarding contacting the API provider, I have done that but the changes will take a while, hence need a temporary workaround for this... – Debapratim Chakraborty Feb 07 '22 at 11:42
  • `val` is the variable. I formatted the string to make it a little more readable. Here's the raw version of it : `val="{input:a,matches:[{in:[\"w\",\"x\",\"y\",\"z\"],output:{num1:0d-2,num2:7.0d-1}},{in:[\"w\",\"x\"],output:{num1:0d-2,num2:8.0d-1}}]}"` – Debapratim Chakraborty Feb 07 '22 at 11:45
  • That does not look like JSON, can you add response content type header to the context? maybe it is some wired RPC message – Andrzej Bistram Feb 07 '22 at 11:58

1 Answers1

1

You could catch anything thats a string with regex and replace it accordingly.

Assuming your strings that need quotes:

  • start with a letter
  • can have numbers at the end
  • never start with numbers
  • never have numbers or special characters in between them

This would be a regex code to catch them:

([a-z]*\d*):

You can try it out here. Or learn more about regex here.

Let's do it in python:

import re

# catch a string in json

json_string = '{input:a,matches:[{in:["w","x","y","z"],output:{num1:0d-2,num2:7.0d-1}},
{in:["w","x"],output:{num1:0d-2,num2:8.0d-1}}]}' # note the single quotes!

# search the strings according to our rule
string_search = re.search('([a-z]*\d*):', json_string)

# extract the first capture group; so everything we matched in brackets
# this is to exclude the colon at the end from the found string as
# we don't want to enquote the colons as well
extracted_strings = string_search.group(1)

This is a solution in case you will build a loop later. However if you just want to catch all possible strings in python as a list you can do simply the following instead:

import re

# catch ALL strings in json

json_string = '{input:a,matches:[{in:["w","x","y","z"],output:{num1:0d-2,num2:7.0d-1}},
{in:["w","x"],output:{num1:0d-2,num2:8.0d-1}}]}' # note the single quotes!
extract_all_strings = re.findall(r'([a-z]*\d*):', json_string)
# note that this by default catches only our capture group in brackets
# so no extra step required

This was about basically regex and finding everything.

With these basics you could either use re.sub to replace everything with itself just in quotes, or generate a list of replacements to verify first that everything went right (probably somethign you'd rather want to do with this maybe a little bit unstable approach) like this. Note that this is why I made this kind of comprehensive answer instead of just pointing you to a "re.sub" one-liner.

You can apporach your scientific number notation problem accordingly.

Cerealz
  • 121
  • 9
  • Thanks. But this doesn't catch the `a` at the beginning of the input; or the figures like `0d-2`. Those should be in quotes as well. – Debapratim Chakraborty Feb 07 '22 at 12:17
  • 1
    @DebapratimChakraborty This would catch your a: ":([a-zA-Z]*)" (without the quotes). About the scientific notation: I am not sure how your pattern works there. Hence I linked you the material to build your own solution to do exactly what you want in the first two links. – Cerealz Feb 07 '22 at 17:43