1

Forgive me if this is a trivial question, I don't have much experience with this.

I have a file that looks like this:

{text},
{text},
{text},
{text},

I want it to look like this

[{text},
 {text},
 {text},
 {text}]

Note that the final comma is removed, and that there are now square brackets at the beginning and end of the file.

So, I have thousands of files in a directory, and each file has to be fixed to do that.

I'm guessing I have to use sed somehow but I don't really know how to make it happen and don't want to do it manually using VIM since there are so many files...

EDIT:

I tried to use:

sed -i '1s/^/\[/;$s/,$/\]/' *

as suggested by codeforester. I get an error saying "Argument list too long"...

shishy
  • 787
  • 1
  • 15
  • 31

1 Answers1

2

I would remove the existing comma at the end of the lines with sed and then use jq to build the json array:

sed 's/,$//' file | jq -s .

To run this over many files, I recommend to create a little shell script:

fix-json.sh

#!/bin/bash
file="${1}"
sed 's/,$//' "${file}" | jq -s . > "${file}.tmp"
if [ ${PIPESTATUS[1]} != 0 ] ; then
    echo "${file} is broken"
    rm "${file}.tmp"
else
    mv "${file}.tmp" "${file}"
fi

Now use find to run the above script on the input files:

chmod +x fix-json.sh 
find /path/to/files -type f -name '*.json' -exec ./fix-json.sh {} \;
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • I copied the script you wrote and ran it as written. (except instead of -name '*.json', I just did -name '*'). Either way, when I did that nothing happened to the files... i.e. the brackets weren't there and the final comma wasn't removed. Also, I only need to remove the comma at the last line not all of the lines – shishy Jan 12 '17 at 21:11
  • Sure, if you use `-name ''` no file will match. If you don't care about the filename omit the `-name` option completely. But be careful and create a backup before you run this (Just to be sure that you didn't missed to mention something important in the question). And keep in mind that `find` runs recursively, meaning it will handle also files in subfolders. If you don't want that use `-maxdepth 1` – hek2mgl Jan 12 '17 at 21:12
  • I have all files in the same directory (collapsed all subfolders) – shishy Jan 12 '17 at 21:15
  • Glad to hear that! :) – hek2mgl Jan 12 '17 at 21:20
  • I spoke too soon. Some of the files given an error saying sed: //name.tmp: No such file or directory. I also see a msg saying "parse error: Expected separator between values at line 4, column 8". – shishy Jan 12 '17 at 21:22
  • It worked when I tried it on a sample of 5 files but the entire directory had an issue. Maybe there are corrupt files...? – shishy Jan 12 '17 at 21:22
  • Yeah, looks like. I've explicitly used `jq` to prove that. Looks like your files have more problems than the missing `[...]` – hek2mgl Jan 12 '17 at 21:24
  • This is just twitter data that I streamed into a bunch of files at size 5mb each. So each JSON object is a tweet that is comma separated. I'm still running it cus it seems like it's still going but there do seem to be some problematic files in there. – shishy Jan 12 '17 at 21:25
  • For reference, the total size of the directory is 30GB so there's a lot of files... – shishy Jan 12 '17 at 21:26
  • I've enhanced the error handling in shell script a bit. More I can't do. – hek2mgl Jan 12 '17 at 21:32
  • Great job @hek2mgl. I have removed my answer. – codeforester Jan 12 '17 at 21:47