-1

I am working with parsing json data to SQL-queries in python and need to take care of replacing single quotation marks with double quotation marks, since the notation of the data im getting is wrong (and I can't change that). The problem I ran into is, that some strings are written english text and contain single quotation marks.

'comment': 'bla bla it's you're can't bla bla',

How do I only replace the ones within written text and not the ones defining attributes? What would a regex for this look like?

yarvis
  • 69
  • 1
  • 6
  • 5
    TL;DR: this is nowhere near valid JSON. If it's supposed to be, this *really* must be fixed on the originating side. If that's not possible, you're somewhat SOL. There's no guarantee something like this can be safely parsed at all. – deceze Mar 06 '20 at 14:15
  • 1
    I'm really curious to know the origin of these dubious JSONs. Like who in their right mind thought it was okay to release these monstrosities into the wild when perfectly good libraries exist for nearly all programming languages. – MonkeyZeus Mar 06 '20 at 14:18
  • I really don't know why I am getting this invalid JSON from the API I'm accessing, but I have no way of changing it before receiving. Just thought there may be a workaround for that. – yarvis Mar 06 '20 at 14:21
  • Either you're somehow misinterpreting what you're getting from that API, or you seriously need to get in touch with that API's author and talk some sense into them. – deceze Mar 06 '20 at 14:26
  • The API I'm accessing is one of Google's. I can't supply any dumps since it's sensitive data, but I just checked by doing a sample request by Google and I am getting the same invalid json. Gonna check in with their support. Maybe there is something wrong on my side I'm missing. – yarvis Mar 06 '20 at 14:37

2 Answers2

1

While I agree with all the comments to your question, just as an exercise I tried to get a valid json string out of what you have. Seems it can be done with a few steps involving string manipulation:

bad = "'comment': 'bla not, really, a comment: bla it's you're can't bla bla'," 
# note that bad has colons, commas and single quotes/apostrophes in it

one = bad.replace("': '",'": "') #separate the key from the value
two = one.replace("'",'"',1) #replace the single quote on the left side of the key with a double quote

#the following lines were lifted from https://stackoverflow.com/a/54945804/9448090
#replace the single quote on the right side of the value with a double quote; drop the last comma:

removal = "'"
reverse_removal = removal[::-1]
replacement = '"'
reverse_replacement = replacement[::-1]

three = two[::-1].replace(reverse_removal, reverse_replacement, 1)[::-1].replace('",','"')
good = "{"+three+"}" #final formatting for json
json.loads(good)

Output:

{'comment': "bla not, really, a comment: bla it's you're can't bla bla"}
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
0

Provided that you assume that there are no comma's and/or colons in your strings, you might be able to recover by grabbing everything between : and , as a string. This could, for example, be accomplished by splitting with a regular expression.

In [1]: s = "'comment1': 'bla bla it's you're can't bla bla','comment2': 'bla bla it's you're can't bla bla',"

In [2]: r = re.compile(r"[:,]")

In [3]: r.split(s)
Out[3]:
["'comment1'",
 " 'bla bla it's you're can't bla bla'",
 "'comment2'",
 " 'bla bla it's you're can't bla bla'",
 '']

Granted, that is a pretty big "if". If there is even a chance that your strings contain comma/colon characters then deceze is correct and you are SOL.

In general, there is no solution to this problem. To see this, consider the following (somewhat contrived) example.

 ... 'comment': 'this is', 'my comments': 'Hi', 

If strings, wrapped in ' are allowed to contain ' , then there is no way to tell if this is meant as 'comment': "this is', 'my comments': 'Hi'", or 'comment': "this is", 'my comments': "Hi", ...

yardsale8
  • 940
  • 9
  • 15
  • Thanks for your help. I already know that there are in fact commas and/or colons in these strings so yeah. I'm SOL. – yarvis Mar 06 '20 at 14:44