0

OS: Ubuntu 14.04

I have 12 large json files (2-4 gb each) that I want to perform different operations on. I want to remove the first line, find "}," and replace it with "}" and remove all "]".

I am using sed to do the operations and my command is:

sed -i.bak -e '1d' -e 's/},/}/g' -e '/]/d' file.json

When i run the command on a small file (12,7kb) it works fine. file.json contains the content with the changes and file.json.bak contains the original content.

But when i run the command on my larger files the original file is emptied, e.g. file.json is empty and file.json.bak contains the original content. The run time is also what I consider to be "to fast", about 2-3 seconds.

What am I doing wrong here?

kongshem
  • 322
  • 1
  • 5
  • 23

1 Answers1

1

Are you sure your input file contains newlines as recognized by the platform you are running your commands on? If it doesn't then deleting one line would delete the whole file. What does wc -l < file tell you?

If it's not that then you probably don't have enough file space to duplicate the file so sed is doing something internally like

mv file backup && sed '...' backup > file

but doesn't have space to create the new file after moving the original to backup. Check your available file space and if you don't have enough and can't get more then you'll need to do something like:

while [ -s oldfile ]
do
    copy first N bytes of oldfile into tmpfile &&
    remove first N bytes from oldfile using real inplace editing &&
    sed 'script' tmpfile >> newfile &&
    rm -f tmpfile
done
mv newfile oldfile

See https://stackoverflow.com/a/17331179/1745001 for how to remove the first N bytes inplace from a file. Pick the largest value for N that does fit in your available space.

Community
  • 1
  • 1
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Your reasoning is locigal, but I have 94GB left on my harddrive and 16GB or RAM(im guessing sed loads the entire file into memory), so I guess that is not the issue. – kongshem Mar 07 '16 at 14:13
  • No, `sed` doesn't load the entire file into memory, it just copies it to a tmp file just like `perl` and `gawk` do for the poorly named "inplace editing". `ed` is the only command I know of that actually loads the entire file into memory. If that's not the issue then idk, sorry - good luck! – Ed Morton Mar 07 '16 at 14:16
  • OK, I edited my answer to include that as a possibility. – Ed Morton Mar 07 '16 at 14:31
  • 'sed 's/x/x/' file.json > newfile' duplicates the file, it does not perform the s/x/y/ operation. – kongshem Mar 07 '16 at 14:35
  • 1
    it doesn't say s/x/y it says s/x/x, it's not supposed to change anything just see if sed is messing up the file somehow. It looks like we've identified the issue though - your whole file is a single line and so the `d` op is deleting the whole line/file. – Ed Morton Mar 07 '16 at 14:51