0

I have a json file that is being updated regularly from another python script. I want to read the most recently appended data in that json file. In other words, I want to read JSON file in a tail -f manner.

I know how to do this in a text file:

self.fileName = fileName
self.file = open(self.fileName, 'r')
self.st_results = os.stat(fileName)
self.st_size = self.st_results[6]
self.file.seek(self.st_size)
while 1:
  where = self.file.tell()
  line = self.file.readline()
  if not line:
    print "No line waiting, waiting for one second"
    time.sleep(1)
    self.file.seek(where)
  else:
    print line

I have tried this with json but it didn't work. Probably because readline is not appropriate.

For eg: My json:

{
    "created_at": "2018-05-05 06:57:58", 
    "id": 992659653782798338, 
    "sequence": 7, 
    "tweet": "RT @SirJadeja: Fun Fact: Stoinis Hit Hardik Pandya For 20 Runs In His Last Over. Brother Krunal Pandya(15) Together With Rohit Sharma(5) Hi\u2026"
}{
    "created_at": "2018-05-05 06:58:00", 
    "id": 992659660204208128, 
    "sequence": 8, 
    "tweet": "RT @surya_14kumar: A very important victory, achieved through a collective team effort. This will definitely set the momentum for us going\u2026"
}

I start my raedTweets.py script which should print the lastly appended data.

{
    "created_at": "2018-05-05 06:59:43", 
    "id": 992660091508699137, 
    "sequence": 13, 
    "tweet": "RT @AndhraPolls: Even a simple inquiry would put #TDP leadership behind the bars.\n#AndhraPradesh #FridayFeeling #MahilaParaBJPSarkara #BJPV\u2026"
}

The output should be:

{ "created_at": "2018-05-05 06:59:43", "id": 992660091508699137, "sequence": 13, "tweet": "RT @AndhraPolls: Even a simple inquiry would put #TDP leadership behind the bars.\n#AndhraPradesh #FridayFeeling

MahilaParaBJPSarkara #BJPV\u2026" }

  • `while True:` is more pythonic – Barmar May 05 '18 at 07:23
  • json always depends on the entire file. You are going to have to reparse the whole file. – MegaIng May 05 '18 at 07:31
  • What do you mean "the most recently appended data"? You should provide a [mcve] – juanpa.arrivillaga May 05 '18 at 07:38
  • @juanpa.arrivillaga I meant the way in which we read error logs using `tail -f` style. – Aviral Srivastava May 05 '18 at 07:58
  • @MegaIng So, should I not use json and instead use, say CSV? – Aviral Srivastava May 05 '18 at 07:59
  • That is not a [mcve]. That doesn't make any sense in the context of JSON. JSON supports two main container types, arrays and objects. Only arrays have a notion of order, so it isn't clear what you are expecting. – juanpa.arrivillaga May 05 '18 at 08:00
  • json has a syntax that needs to be kept so even if say from a database regular backup a new file might not append the new data at the end so really what you need most is a revision system that can compare the changes made to the file and highlight the changes back at you. Git does this easily or look for a python tool that can do this also. Comment if you still need "some" support – Samuel Muiruri May 05 '18 at 08:12
  • @juanpa.arrivillaga I have not listed any example. I have listed a code which works perfectly fine for a text file. I want your help in reading a json file. Updated my question with an example. – Aviral Srivastava May 05 '18 at 08:14
  • also in context of parsing the data in python you'd likely use `json.load(old_data)` and `json.load(new_data)` and then probably use `set` to compare the two and get the difference e.g. `diff = new_json - old_json` – Samuel Muiruri May 05 '18 at 08:14
  • Will that be more expensive as compared to my proposed method with text files? @SamuelMuiruri – Aviral Srivastava May 05 '18 at 08:17
  • @AviralSrivastava To come back to your question. CSV would be perfect. – MegaIng May 05 '18 at 08:18
  • @AviralSrivastava speed wise I expect it will be as fast if not faster, also based on the example data you should be able to use `set` to find out new data. You can store the last check using pickle for example? – Samuel Muiruri May 05 '18 at 08:30

1 Answers1

0

If the requirement is of saving data in some format(key, values), one should try csv instead of json. I have a workaround for reading csv files in tail -f manner which solved my issue. This answer provides the perfect solution.