We are working on a program to fetch slide image data from a group of servers that doesn't have a consistent schema setup (I'm worried it's invalid, but I'm not proficient enough to make that call). We have no influence over the servers as independent-unrelated researchers.
The data was entered manually (for the most part), through a large series of forms (n>50), with inconsistent fields (data goes back to the 90's). Here is an example of a response:
{
"form12873": [
{
"id": "9202075838",
"timestamp": "2015-06-25 10:24:51",
"user_agent": "Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit\/600.6.3 (KHTML, like Gecko) Version\/8.0.6 Safari\/600.6.3",
"remote_addr": "[Re.dact.ed]",
"processed": "1",
"data": {
"33885124": {
"field": "33885124",
"value": "CDat Lab",
"flat_value": "CDat Lab",
"label": "Completed by:",
"type": "select"
},
''**Several more fields as above**''...
"33884660": {
"field": "33884660",
"value": {
"slideX": "2456123",
"slideY": "456632",
"label": "K-20150322148",
"approved": "1",
"score": "30144"
},
"flat_value": "slideX = 2456123\nslideY = 456632\nlabel = K-20150322148\napproved = 1\nscore = 30144",
"label": "Slide Stats:",
"type": "slidestats"
},
''**Some of the fields are as above...
"31970564": {
"field": "31970564",
"value": [
"System",
"Crated",
"Mirax",
"NanoZoomer",
"ThinPrep",
"Aperio",
"Intellisite"
],
"flat_value": "System\nCrated\nMirax\nNanoZoomer\nThinPrep\nAperio\nIntellisite",
"label": "System Information",
"type": "checkbox"
},
''**Some of the values are Arrays...
"33883781": {
"field": "33883781",
"selection": "Retain",
"label": "4. Retain\/Remove\/Review",
"type": "selectdrop"
},
''**Some of the fields don't have the same children
"52792890": {
"field": "52792890",
"image": "'A really large byte[], removed for ease of reading'",
"type": "image"
}
''**Somewhere near the end of each response is the actual image...
}
},
{
"id": "33884681",
''**Then it continues on as above until the end:
}
], "total": 170, "pages": 5, "pretty_id": "478125624983" }
In the past when I've been able to model/class for the structure of the JSON, I've known how to handle it (make a data class with field, value, etc defined).
Trying solutions like:
var result = JsonConvert.DeserializeObject<List<Dictionary<string,
Dictionary<string, string>>>>(content);
Always led to array errors or cast issues (even if direct casts were added). I am able to get the actual first array using:
Public Shared Function Tabulate(json As String) As DataTable
Dim jsonLinq = Newtonsoft.Json.Linq.JObject.Parse(json)
' Find the first array using Linq
Dim srcArray = jsonLinq.Descendants().Where(Function(d) TypeOf d Is JArray).First()
Dim trgArray = New Newtonsoft.Json.Linq.JArray()
For Each row As JObject In srcArray.Children(Of JObject)()
Dim cleanRow = New JObject()
For Each column As JProperty In row.Properties()
' Only include JValue types
If TypeOf column.Value Is JValue Then
cleanRow.Add(column.Name, column.Value)
End If
Next
trgArray.Add(cleanRow)
Next
Return JsonConvert.DeserializeObject(Of DataTable)(trgArray.ToString())
End Function
My end goal is also to get to a data table, and looping/the image byte have me concerned about trying to regressively go down to further children. My attempts to then deserialize using the first array have then come up nill.
If there is a quick way to handle this, I'd love the solution. If the problem is that I'm trying to handle crap JSON, I'd love a reference to where the current standard is being broken (so I can at least try to get the other institution to change their servers). That said, I'll probably have to deal with it anyway, even if it is loops.
*Note: The project was started in VB.net, so we've kept it that way, but I may decide to port to C# anyway. Code in either would be great.
Below is an unmarked example of the Json that should be usable for testing. My end goal is to flatten this into a datatable:
{
"form12873": [
{
"id": "9202075838",
"timestamp": "2015-06-25 10:24:51",
"user_agent": "Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit\/600.6.3 (KHTML, like Gecko) Version\/8.0.6 Safari\/600.6.3",
"remote_addr": "[Re.dact.ed]",
"processed": "1",
"data": {
"33885124": {
"field": "33885124",
"value": "CDat Lab",
"flat_value": "CDat Lab",
"label": "Completed by:",
"type": "select"
},
"33884660": {
"field": "33884660",
"value": {
"slideX": "2456123",
"slideY": "456632",
"label": "K-20150322148",
"approved": "1",
"score": "30144"
},
"flat_value": "slideX = 2456123\nslideY = 456632\nlabel = K-20150322148\napproved = 1\nscore = 30144",
"label": "Slide Stats:",
"type": "slidestats"
},
"31970564": {
"field": "31970564",
"value": [
"System",
"Crated",
"Mirax",
"NanoZoomer",
"ThinPrep",
"Aperio",
"Intellisite"
],
"flat_value": "System\nCrated\nMirax\nNanoZoomer\nThinPrep\nAperio\nIntellisite",
"label": "System Information",
"type": "checkbox"
},
"33883781": {
"field": "33883781",
"selection": "Retain",
"label": "4. Retain\/Remove\/Review",
"type": "select"
}
}
}
], "total": 170, "pages": 5, "pretty_id": "478125624983" }