1

I am using Json.net to deserialize and work with a file containing anywhere from one to thousands of JSON formated Twitter tweets (think of it as a large text file archive of individual JSON objects). For simplicity of this post and illustration purposes, I cut the JSON for each Tweet down to only four lines, but they actually have dozens of properties.

I am not creating the file containing the tweets in JSON - it comes to me in this form from a database that I do not control or have access to. Here is an example of a file that contains one tweet:

{
    "created_at": "Thu Mar 20 02:59:35 +0000 2009",
    "text": "meet you later tonight"
}

Here is an example of a file that contains two tweets:

{
    "created_at": "Thu Mar 21 02:59:35 +0000 2010",
    "text": "meet you later"
}
{
    "created_at": "Thu Mar 22 02:59:35 +0000 2010",
    "text": "see you tonight"
}

Issue: How to deserialize a file into a list of tweet objects when it may contain either one or more than one JSON objects?

Working solution #1: Turn the Tweet messages stored in JSON format into an array even if the file has only one Tweet. In addition, if there is more than one tweet, a comma needs to be inserted after each tweet except for the last tweet to put the tweets in standard JSON format. After inserting commas as needed, the following code works:

 JArray o = JArray.Parse("[" + textFileOfOneOrMoreTweetsInStandardJson + "]");
 List<Tweet> tweets = JsonConvert.DeserializeObject <List<Tweet>> (o.ToString());

Solution #2: After placing commas in as needed, create and use a custom convertor that overrides read and determines if there is just one tweet or more than one tweet and deserialize based on that. Example of using this solution: Deserializing JSON when sometimes array and sometimes object

I am using the former - any major downside to using solution #1 over solution #2? Am I not considering an alternative solution?

Unresolved issue: How to validate the archive of individual JSON objects against a schema before deserializing? If I treat the objects as an array as I did above, the json.net validation tool stops after validating the first object and ignores the rest of the objects. I guess I could somehow go through them one at a time and validate each of them one at a time before deserializing, but that seems ineffeciant and will be time consuming is there are thousands of tweets.

In sum, JSON is not really designed for archiving between one and thousands of individual JSON objects together in a single text file, so it has been an uphill battle to work with the file.

Side note: I do not care about serialization, I just need to analyze the tweets with LINQ by getting them into a list of tweet objects. It would be nice to validate each tweet against a schema before deserializing, but that may be too difficult or time consuming.

Because I could not find it anywhere on the Net and it may help someone else, I am providing below the schema I created for Twitter tweets based on Twitter's API. Note, however, even though the API reflects that the fields are not optional, they are in my archive, so I needed to indicate optional for valididation purposes (it is amazing that one tweet has the potential to contain dozens of JSON properties!).

Twitter Tweet JSON schema

    {
  "$schema": "http://json-schema.org/schema#",
  "description": "Twitter API JSON Schema",
  "type": "object",
  "properties": {
    "annotations":{
      "optional":true,
      "type":[
        "object",
        "null"
      ],
      "properties":{
      }
    },
    "contributors":{
      "optional":true,
      "type":[
        "array",
        "null"
      ],
      "type":[
        "object",
        "null"
      ],
      "properties":{
        "id":{
          "optional":true,
          "type":"integer"
        },
        "id_str":{
          "optional":true,
          "type":"string"
        },
        "screen_name":{
          "optional":true,
          "type":"string"
        }
      }
    },
    "coordinates":{
      "optional":true,
      "type":[
        "object",
        "null"
      ],
      "properties":{
        "coordinates":{
          "optional":true,
          "type":[
            "array",
            "number"
          ]
        },
        "type":{
          "optional":true,
          "type":"string"
        }
      }
    },
    "created_at":{
      "optional":true,
      "type":"string"
    },
    "entities":{
      "optional":true,
      "type":"object",
      "properties":{
        "hashtags":{
          "optional":true,
          "type":"object",
          "type":"array",
          "properties":{
            "indices":{
              "type":"array",
              "type":"integer",
              "type":"integer"
            }
          }
        },
        "media":{
          "optional":true,
          "type":"object",
          "type":"array",
          "properties":{
            "display_url":{
            "optional":true,
              "type":"string"
              },
            "expanded_url":{
              "optional":true,
              "type":"string"
              },
            "id":{
              "optional":true,
              "type":"integer"
              },
            "id_str":{
              "optional":true,
              "type":"string"
              },
           "indices":{
              "optional":true,
              "type":"array",
              "type":"integer"
            },
            "media_url":{
              "optional":true,
              "type":"string"
              },
            "media_url_https":{
              "optional":true,
              "type":"string"
              },
            "sizes":{
              "optional":true,
              "type":"object",
              "properties":{
                "thumb":{
                  "optional":true,
                  "type":"object",
                  "properties":{
                    "h":{
                      "optional":true,
                      "type":"integer"
                      },
                    "resize":{
                      "optional":true,
                      "type":"string"
                      },
                    "w":{
                      "optional":true,
                      "type":"integer"
                    }
                  }
                },
                "large":{
                  "optional":true,
                  "type":"object",
                  "properties":{
                    "h":{
                      "optional":true,
                      "type":"integer"
                      },
                    "resize":{
                      "optional":true,
                      "type":"string"
                      },
                    "w":{
                      "optional":true,
                      "type":"integer"
                    }
                  }
                },
                "medium":{
                  "optional":true,
                  "type":"object",
                  "properties":{
                    "h":{
                      "optional":true,
                      "type":"integer"
                      },
                    "resize":{
                      "optional":true,
                      "type":"string"
                      },
                    "w":{
                      "optional":true,
                      "type":"integer"
                    }
                  }
                },
                "small":{
                  "optional":true,
                  "type":"object",
                  "properties":{
                    "h":{
                      "optional":true,
                      "type":"integer"
                      },
                    "resize":{
                      "optional":true,
                      "type":"string"
                      },
                    "w":{
                      "optional":true,
                      "type":"integer"
                    }
                  }
                }
              }
            }
          }
        },
        "urls":{
          "optional":true,
          "type":"object",
          "type":"array",
          "properties":{
            "display_url":{
              "optional":true,
              "type":"string"
            },
            "expanded_url":{
              "optional":true,
              "type":"string"
            },
            "indices":{
              "optional":true,
              "type":"array"
            },
            "url":{
              "optional":true,
              "type":"string"
            }
          }
        },
        "user_mentions":{
          "optional":true,
          "type":"object",
          "type":"array",
          "properties":{
            "id":{
              "optional":true,
              "type":"integer"
            },
            "id_str":{
              "optional":true,
              "type":"string"
            },
            "indices":{
              "optional":true,
              "type":"array"
            },
            "name":{
              "optional":true,
              "type":"string"
            },
            "screen_name":{
              "optional":true,
              "type":"string"
            }
          }
        }
      }
    },
    "favorited":{
      "optional":true,
      "type":"boolean"
    },
    "geo":{
      "optional":true,
      "type":[
        "object",
        "null"
      ],
      "properties":{
      }
    },
    "id":{
      "optional":true,
      "type":"integer"
    },
    "id_str":{
      "optional":true,
      "type":"string"
    },
    "in_reply_to_screen_name":{
      "optional":true,
      "type":[
        "string",
        "null"
      ]
    },
    "in_reply_to_status_id":{
      "optional":true,
      "type":[
        "integer",
        "null"
      ]
    },
    "in_reply_to_status_id_str":{
      "optional":true,
      "type":[
        "string",
        "null"
      ]
    },
    "in_reply_to_user_id":{
      "optional":true,
      "type":[
        "integer",
        "null"
      ]
    },
    "in_reply_to_user_id_str":{
      "optional":true,
      "type":[
        "string",
        "null"
      ]
    },
    "place":{
      "optional":true,
      "type":[
        "object",
        "null"
      ],
      "properties":{
        "country":{
          "optional":true,
          "type":"string"
        },
        "country_code":{
          "optional":true,
          "type":"string"
        },
        "full_name":{
          "optional":true,
          "type":"string"
        },
        "id":{
          "optional":true,
          "type":"string"
        },
        "name":{
          "optional":true,
          "type":"string"
        },
        "place_type":{
          "optional":true,
          "type":"string"
        },
        "url":{
          "optional":true,
          "type":"string"
        },
        "bounding_box":{
          "optional":true,
          "type":"object",
          "properties":{
            "coordinates":{
              "optional":true,
              "type":"array",
              "type":"array",
              "type":"array",
              "type":"array",
              "type":"array"
            },
            "type":{
              "optional":true,
              "type":"string"
            }
          }
        }
      }
    },
    "retweet_count":{
      "optional":true,
      "type":[
        "integer",
        "string"
      ]
    },
    "retweeted":{
      "optional":true,
      "type":"boolean"
    },
    "source":{
      "optional":true,
      "type":"string"
    },
    "text":{
      "optional":true,
      "type":"string"
    },
    "truncated":{
      "optional":true,
      "type":"boolean"
    },
    "user":{
      "optional":true,
      "type":"object",
      "properties":{
        "contributors_enabled":{
          "optional":true,
          "type":"boolean"
        },
        "created_at":{
          "optional":true,
          "type":"string"
        },
        "default_profile":{
          "optional":true,
          "type":"boolean"
        },
        "default_profile_image":{
          "optional":true,
          "type":"boolean"
        },
        "description":{
          "optional":true,
          "type":[
            "string",
            "null"
          ]
        },
        "favourites_count":{
          "optional":true,
          "type":"integer"
        },
        "follow_request_sent":{
          "optional":true,
          "type":[
            "boolean",
            "null"
          ]
        },
        "following":{
          "optional":true,
          "type":[
            "string",
            "null"
          ]
        },
        "followers_count":{
          "optional":true,
          "type":"integer"
        },
        "friend_count":{
          "optional":true,
          "type":"integer"
        },
        "geo_enabled":{
          "optional":true,
          "type":"boolean"
        },
        "id":{
          "optional":true,
          "type":"integer"
        },
        "id_str":{
          "optional":true,
          "type":"string"
        },
        "is_translator":{
          "optional":true,
          "type":"boolean"
        },
        "lang":{
          "optional":true,
          "type":"string"
        },
        "listed_count":{
          "optional":true,
          "type":"integer"
        },
        "location":{
          "optional":true,
          "type":[
            "string",
            "null"
          ]
        },
        "name":{
          "optional":true,
          "type":"string"
        },
        "notifications":{
          "optional":true,
          "type":"null"
        },
        "profile_background_color":{
          "optional":true,
          "type":"string"
        },
        "profile_background_image_url":{
          "optional":true,
          "type":"string"
        },
        "profile_background_image_url_https":{
          "optional":true,
          "type":"string"
        },
        "profile_background_tile":{
          "optional":true,
          "type":"boolean"
        },
        "profile_image_url":{
          "optional":true,
          "type":"string"
        },
        "profile_image_url_https":{
          "optional":true,
          "type":"string"
        },
        "profile_link_color":{
          "optional":true,
          "type":"string"
        },
        "profile_sidebar_border_color":{
          "optional":true,
          "type":"string"
        },
        "profile_sidebar_fill_color":{
          "optional":true,
          "type":"string"
        },
        "profile_text_color":{
          "optional":true,
          "type":"string"
        },
        "profile_use_background_image":{
          "optional":true,
          "type":"boolean"
        },
        "protected":{
          "optional":true,
          "type":"boolean"
        },
        "screen_name":{
          "optional":true,
          "type":"string"
        },
        "show_all_inline_media":{
          "optional":true,
          "type":"boolean"
        },
        "statuses_count":{
          "optional":true,
          "type":"integer"
        },
        "time_zone":{
          "optional":true,
          "type":[
            "string",
            "null"
          ]
        },
        "url":{
          "optional":true,
          "type":[
            "string",
            "null"
          ]
        },
        "utc_offset":{
          "optional":true,
          "type":[
            "integer",
            "null"
          ]
        },
        "verified":{
          "optional":true,
          "type":"boolean"
        }
      }
    }
  }
}
Community
  • 1
  • 1
  • "In sum, JSON is not really designed for archiving between one and thousands of individual JSON objects together in a single text file..." I disagree , you can use [ ,, ,, ] to hold collections of "JSON" object and your file would be still a valid json file. – mpm Mar 04 '12 at 02:12
  • "Unresolved issue: How to validate the archive of individual JSON objects against a schema before deserializing?" If your file is not a valid json file you cant validate anything for sure. – mpm Mar 04 '12 at 02:17
  • Thanks - yes, adding brackets makes many JSON tweets valid jason (transforms them into one big array), but the schema is for only one tweet, not an array of tweets that can have different sizes, so even though each tweet is valid JSON and with brackets a group is a valid JSON array, the schema will not verify. I would need a schema for evry possibility of array size from 2 tweets to thousands of tweets for the schema to validate an array of tweets. – user1247478 Mar 14 '12 at 03:26

0 Answers0