1

I have a 1.5G JSON file. It is supposed to contain an array of objects, however there is an extra comma after the last object in the array.

selah@wwbp:~$ tail -n4 /data/selah/diabetes_tweets.json 
    "type": "retweet:reply", 
    "citation_url": "http://twitter.com/Garthicus/status/5903085804"
},
]

I tried editing with VI and some other text editors but they all froze. Is there an easy programmatic way to remove this comma with python?

Selah
  • 7,728
  • 9
  • 48
  • 60
  • 2
    Personally, I would truncate the file to be 3 chars shorter then just append write the \n and ] back onto it http://www.tutorialspoint.com/python/file_truncate.htm – Vality Dec 22 '14 at 16:26
  • In general, open the file, seek to the end, start reading backwards until you find the first comma, then remove it by pasting the text you've read so far to the position before the current cursor. – Giulio Franco Dec 22 '14 at 16:27
  • Are you sure you need to remove the character at all? Trailing commas are permitted in some languages. Ex. In Python, `[1,2,3,]` is valid syntax. – Kevin Dec 22 '14 at 16:32
  • @Kevin the formal JSON standard does not allow a trailing comma. See also [this question](http://stackoverflow.com/questions/201782/can-you-use-a-trailing-comma-in-a-json-object). – Random832 Dec 22 '14 at 16:36
  • @Selah how was the file generated? It'd be easier to fix it in the first place. – Random832 Dec 22 '14 at 16:37
  • it was generated from a script that makes API requests... it takes hours to run – Selah Dec 22 '14 at 16:54

2 Answers2

1

use this to remove the last two lines head -n -2 myfile.txt > myfile_fix.txt then echo '}' >> myfile_fix.txt echo ']' >> myfile_fix.txt to add back what you need.

Selah
  • 7,728
  • 9
  • 48
  • 60
Camron_Godbout
  • 1,583
  • 1
  • 15
  • 22
  • This doesn't work for me on a mac: `head: illegal line count -- -2`. Maybe in a different shell `-2` works? Also, I thought `head` would only print the lines. Anyway this solution relates to: http://stackoverflow.com/questions/4881930/bash-remove-the-last-line-from-a-file – mattsilver Dec 22 '14 at 16:40
  • "Note that this works with some versions of head, but is not standard. Indeed, the standard for head states: The application shall ensure that the number option-argument is a positive decimal integer." - http://stackoverflow.com/users/140750/william-pursell – Camron_Godbout Dec 22 '14 at 16:51
  • 1
    Maybe a more robust solution would be to use wc to get the number of lines and then subtract 2? However -n -2 worked for me on Ubuntu 14.04 – Selah Dec 22 '14 at 16:59
0

You could fix this with the following Node.js script:

var fs = require('fs');
var data = fs.readFileSync(process.argv[2], 'utf-8');
console.log(JSON.stringify(eval("(" + data + ")")));

node fix.js your.json

thammi
  • 444
  • 1
  • 4
  • 17