0

I have a json file of timestamped data points and need a histogram showing the number of data points per unit of time. The data is in the following format:

database = {
  "data": [
    {
      "timestamp": "Mon Aug 01 00:00:01 +0000 1901",
      "user": 796327373691985921,
      "text": "blah blah there were no tweets in 1901!?!",
      "polarity": 0.2,
      "subjectivity": 0.2
    },
    {
      "timestamp": "Mon Aug 01 00:00:10 +0000 1901",
      "user": 16548385,
      "text": "blah blah blah"
      "polarity": 0.0,
      "subjectivity": 0.0
    }
  ]
}

etc

I am having trouble picking the timestamp item out of the dictionary. For instance, when I run this: print(database["data"][0]["timestamp"], it gives me the timestamp for one data point but how do I organize all the tweets into time buckets based on the timestamps? I suspect an iterating loop is required but I don't know how to proceed. Thank you again!

1 Answers1

0

1) Convert your timestamps into seconds since the beginning of the day (using datetime.timedelta perhaps).

2) Now, create the histogram with fixed bin edges:

edges = list(range(0, 24 * 3600, 3600))
plt.hist(data, edges)
honza_p
  • 2,073
  • 1
  • 23
  • 37
  • Thank you for your help. I am having trouble picking the timestamp item out of the dictionary. For instance, when I run this: print(database["data"][0]["timestamp"], it gives me the timestamp for one data point and I can manipulate that with datetime.timedelta, but how do I apply those changes to all data points? I suspect an iterating loop is required but I don't know how to proceed. Thank you again! – F0restPerson Sep 07 '18 at 18:55
  • I'd suggest that you use list comprehension to extract this. See https://www.pythonforbeginners.com/basics/list-comprehensions-in-python – honza_p Sep 10 '18 at 12:29