As python caused this problem I think the best solution is to let python fix it ;-). Fortunately with jython you can stick with a pure java implementation.
First you need to add the jython standalone dependency in your pom.xml:
<dependencies>
<dependency>
<groupId>org.python</groupId>
<artifactId>jython-standalone</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
</dependencies>
(As you can see I also used apache commons io for my example so I added it as well)
I put your (invalid) json string into the text file "c:/temp/json.txt" which has the following content:
{'url': u'https://somedomain/', 'fields': {'policy':
u'eyJjb25kaXRpb25zIjogW1siYfgfhudGVudC1sZjMyWiJ9', 'AWSAccessKeyId':
u'ASIccccccNA', 'x-amz-security-token': 'FQofgF', 'key': u'bbb.file',
'signature': u'rm9gdflkjfs='}}
Now here is the code to read the json file, set up the Python Interpreter and handover the json to clean it up:
String content = FileUtils.readFileToString(new File("c:/temp/json.txt"), "UTF-8");
PythonInterpreter pi = new PythonInterpreter();
pi.exec("import json");
pi.exec("jsondata = " + content);
pi.exec("jsonCleaned = json.dumps(jsondata)");
PyObject jsonCleaned = (PyObject) pi.get("jsonCleaned");
System.out.println(jsonCleaned.asString());
pi.close();
The output is:
{"url": "https://somedomain/", "fields": {"signature": "rm9gdflkjfs=", "AWSAccessKeyId": "ASIccccccNA", "x-amz-security-token": "FQofgF", "key": "bbb.file", "policy": "eyJjb25kaXRpb25zIjogW1siYfgfhudGVudC1sZjMyWiJ9"}}
When you put that in a json validator (https://jsonlint.com/) you can see that it is a valid json now.
I can't tell if the performance is good enough for your use case so you have to test that out.
Remark:
In Eclipse there seems to be a bug with that jython version. It shows the following error:
console: Failed to install '': java.nio.charset.UnsupportedCharsetException: cp0.
Although it works nevertheless you can get rid of it by adding the following VM-Argument to your Run-Configuration:
-Dpython.console.encoding=UTF-8
Remark2: For the sake of completeness and to fully answer that question - here is how you can deserialize the cleaned JSON:
Add GSON Dependency to your pom.xml:
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.2</version>
</dependency>
Create representing classes:
Info class
public class Info {
private String url;
private Fields fields;
public String getUrl() {
return url;
}
public void setUrl(String url) {
this.url = url;
}
public Fields getFields() {
return fields;
}
public void setFields(Fields fields) {
this.fields = fields;
}
}
Fields class
import com.google.gson.annotations.SerializedName;
public class Fields {
private String signature;
private String AWSAccessKeyId;
@SerializedName("x-amz-security-token")
private String x_amz_security_token;
private String key;
private String policy;
public String getSignature() {
return signature;
}
public void setSignature(String signature) {
this.signature = signature;
}
public String getAWSAccessKeyId() {
return AWSAccessKeyId;
}
public void setAWSAccessKeyId(String aWSAccessKeyId) {
AWSAccessKeyId = aWSAccessKeyId;
}
public String getX_amz_security_token() {
return x_amz_security_token;
}
public void setX_amz_security_token(String x_amz_security_token) {
this.x_amz_security_token = x_amz_security_token;
}
public String getKey() {
return key;
}
public void setKey(String key) {
this.key = key;
}
public String getPolicy() {
return policy;
}
public void setPolicy(String policy) {
this.policy = policy;
}
}
Finally add the following code after you get your cleaned JSON:
Gson gson = new Gson();
Info info = gson.fromJson(jsonCleaned.asString(), Info.class);