2

I had the next string without quotes and with arrays and sub-dictionaries:

s ='{source: [s3, kinesis], aws_access_key_id: {myaws1, myaws2}, aws_secret_access_key: REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY, bucketName: bucket, region_name: eu-west-1, fileType: zip, typeIngestion: FULL, project: trackingcampaigns, functionalArea: client, filePaths: [s3Sensor/2018/], prefixFiles: [Tracking_Sent, Tracking_Bounces, Tracking_Opens, Tracking_Clicks, Tracking_SendJobs], prefixToTables: {Tracking_Bounces: MNG_TRACKING_EXTRACT_BOUNCES_3, Tracking_Sent: MNG_TRACKING_EXTRACT_SENT_3, Tracking_Clicks: MNG_TRACKING_EXTRACT_CLICKS_3, Tracking_Opens: MNG_TRACKING_EXTRACT_OPENS_3, Tracking_SendJobs: MNG_TRACKING_EXTRACT_SENDJOBS_3}, stagingPath: /zipFiles/}'

I would like to convert it to a dictionary.

Eric Bellet
  • 1,732
  • 5
  • 22
  • 40
  • Possible duplicate of [Convert a String representation of a Dictionary to a dictionary?](https://stackoverflow.com/questions/988228/convert-a-string-representation-of-a-dictionary-to-a-dictionary) – Ahndwoo Oct 25 '19 at 15:17
  • @Ahndwoo it is not duplicate in my case is without quotes – Eric Bellet Oct 25 '19 at 15:19
  • in order to evaluate the string you need quotes on keys and string values. You will need to parse the string using separators as `,` and `: ` to parse your string into a dictionary – Veilkrand Oct 25 '19 at 15:20
  • @Veilkrand it is not true. I need to do something similar to this post https://stackoverflow.com/questions/52612435/how-to-convert-string-without-quotes-to-dictionary-in-python/52612645 – Eric Bellet Oct 25 '19 at 15:23
  • @EricBellet check my answer – Veilkrand Oct 25 '19 at 15:29
  • You can sequentially parse the string character by character, only splitting key value pairs when you hit a separator while not within an 'inner object'(this can be checked by tracking the number of open and close parenthesis). It's tedious, but it will work. – Cheri Oct 25 '19 at 15:45

4 Answers4

5

You can use regular expressions to process the string to add in the quotes before trying to evaluate:

import re
import ast

s = "{source: s3, aws_access_key_id: myaws, aws_secret_access_key: REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY, bucketName: bucket, region_name: eu-west-1, fileType: zip, typeIngestion: FULL, project: trackingcampaigns, functionalArea: client, filePaths: [s3Sensor/2018/], prefixFiles: [Tracking_Sent, Tracking_Bounces, Tracking_Opens, Tracking_Clicks, Tracking_SendJobs], prefixToTables: {Tracking_Bounces: MNG_TRACKING_EXTRACT_BOUNCES_3, Tracking_Sent: MNG_TRACKING_EXTRACT_SENT_3, Tracking_Clicks: MNG_TRACKING_EXTRACT_CLICKS_3, Tracking_Opens: MNG_TRACKING_EXTRACT_OPENS_3, Tracking_SendJobs: MNG_TRACKING_EXTRACT_SENDJOBS_3}, stagingPath: /zipFiles/}"

s = re.sub(r':\s?(?![{\[\s])([^,}]+)', r': "\1"', s) #Add quotes to dict values
s = re.sub(r'(\w+):', r'"\1":', s) #Add quotes to dict keys

def add_quotes_to_lists(match):
    return re.sub(r'([\s\[])([^\],]+)', r'\1"\2"', match.group(0))

s = re.sub(r'\[[^\]]+', add_quotes_to_lists, s) #Add quotes to list items

final = ast.literal_eval(s) #Evaluate the dictionary

print(final)

Not the prettiest solution, and I only have one example of input so I can't guarantee how robust this solution is, but it works for the sample provided.

Ahndwoo
  • 1,025
  • 4
  • 16
2

I don't think this is very easy to do in a robust way with just the built-in modules, so here's a solution that makes use of PyParsing. I took the example jsonParser.py and modified it to recognize strings that don't use quote marks, and added a set literal for your {myaws1, myaws2} value.

import pyparsing as pp
from pyparsing import pyparsing_common as ppc

def make_keyword(kwd_str, kwd_value):
    return pp.Keyword(kwd_str).setParseAction(pp.replaceWith(kwd_value))
TRUE  = make_keyword("true", True)
FALSE = make_keyword("false", False)
NULL  = make_keyword("null", None)

LBRACK, RBRACK, LBRACE, RBRACE, COLON = map(pp.Suppress, "[]{}:")

jsonString = pp.OneOrMore(pp.CharsNotIn('{}[]:,')).setParseAction(lambda s, l, t: [t[0].strip()])
jsonNumber = ppc.number()

jsonObject = pp.Forward()
jsonValue = pp.Forward()
jsonElements = pp.delimitedList( jsonValue )
jsonArray = pp.Group(LBRACK + pp.Optional(jsonElements, []) + RBRACK)
jsonSet = pp.Group(LBRACE + pp.Optional(jsonElements, []) + RBRACE).setParseAction(lambda s,l,t: set(t[0]))
jsonValue << (jsonNumber | jsonString | pp.Group(jsonObject)  | jsonArray | jsonSet | TRUE | FALSE | NULL)
memberDef = pp.Group(jsonString + COLON + jsonValue)
jsonMembers = pp.delimitedList(memberDef)
jsonObject << pp.Dict(LBRACE + pp.Optional(jsonMembers) + RBRACE)

jsonComment = pp.cppStyleComment
jsonObject.ignore(jsonComment)


if __name__ == "__main__":
    s ='{source: [s3, kinesis], aws_access_key_id: {myaws1, myaws2}, aws_secret_access_key: REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY, bucketName: bucket, region_name: eu-west-1, fileType: zip, typeIngestion: FULL, project: trackingcampaigns, functionalArea: client, filePaths: [s3Sensor/2018/], prefixFiles: [Tracking_Sent, Tracking_Bounces, Tracking_Opens, Tracking_Clicks, Tracking_SendJobs], prefixToTables: {Tracking_Bounces: MNG_TRACKING_EXTRACT_BOUNCES_3, Tracking_Sent: MNG_TRACKING_EXTRACT_SENT_3, Tracking_Clicks: MNG_TRACKING_EXTRACT_CLICKS_3, Tracking_Opens: MNG_TRACKING_EXTRACT_OPENS_3, Tracking_SendJobs: MNG_TRACKING_EXTRACT_SENDJOBS_3}, stagingPath: /zipFiles/}'

    results = jsonObject.parseString(s)
    print(results.asDict())

Result:

{'source': ['s3', 'kinesis'], 'aws_access_key_id': {'myaws1', 'myaws2'}, 'aws_secret_access_key': 'REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY', 'bucketName': 'bucket', 'region_name': 'eu-west-1', 'fileType': 'zip', 'typeIngestion': 'FULL', 'project': 'trackingcampaigns', 'functionalArea': 'client', 'filePaths': ['s3Sensor/2018/'], 'prefixFiles': ['Tracking_Sent', 'Tracking_Bounces', 'Tracking_Opens', 'Tracking_Clicks', 'Tracking_SendJobs'], 'prefixToTables': {'Tracking_Bounces': 'MNG_TRACKING_EXTRACT_BOUNCES_3', 'Tracking_Sent': 'MNG_TRACKING_EXTRACT_SENT_3', 'Tracking_Clicks': 'MNG_TRACKING_EXTRACT_CLICKS_3', 'Tracking_Opens': 'MNG_TRACKING_EXTRACT_OPENS_3', 'Tracking_SendJobs': 'MNG_TRACKING_EXTRACT_SENDJOBS_3'}, 'stagingPath': '/zipFiles/'}
Kevin
  • 74,910
  • 12
  • 133
  • 166
0

You could use regular expressions to format the string before preparing it's json representation.

import re
json_string = {}
pattern = re.compile(r'(\w*):\s(\w*)')
matches = re.finditer(pattern, s)
for match in matches:
    json_string[match.group(1)] = match.group(2)
print(json_string)

Your string shows a pattern where the key value pair for your json string is separated by a : followed by a whitespace. \w* matches any no. of string characters(in addition to numbers) and \s helps you detect the whitespace. finditer methods returns an iterable for you to loop through and grab the groups inside your pattern. You can read more about group ids here

Abhyudai
  • 826
  • 7
  • 16
-3

Not tested but you could do some string cleanup and split the keys and values into a dictionary:

s ='{source: s3, aws_access_key_id: myaws, aws_secret_access_key: REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY, bucketName: bucket, region_name: eu-west-1, fileType: zip, typeIngestion: FULL, project: trackingcampaigns, functionalArea: client, filePaths: [s3Sensor/2018/], prefixFiles: [Tracking_Sent, Tracking_Bounces, Tracking_Opens, Tracking_Clicks, Tracking_SendJobs], prefixToTables: {Tracking_Bounces: MNG_TRACKING_EXTRACT_BOUNCES_3, Tracking_Sent: MNG_TRACKING_EXTRACT_SENT_3, Tracking_Clicks: MNG_TRACKING_EXTRACT_CLICKS_3, Tracking_Opens: MNG_TRACKING_EXTRACT_OPENS_3, Tracking_SendJobs: MNG_TRACKING_EXTRACT_SENDJOBS_3}, stagingPath: /zipFiles/}'

s = s[1:-1]
data = {i.split(': ')[0]: i.split(': ')[1] for i in s.split(', ')}

Veilkrand
  • 325
  • 1
  • 3
  • 16
  • This is the answer to the post that I shared with you, I had arrays and sub-dictionaries as values so that solution it does not work – Eric Bellet Oct 25 '19 at 15:29