2

I have a very huge txt document (120MB) I would like to modify only a few lines at the end of the document. So far I have done it with Vi - opened the whole document and then scrolled down. But, loading so big documents even with Vi takes a lot of time. Is there a way to select lines in Unix and modify them in a command line shell?

Thanks

user3635159
  • 157
  • 2
  • 11
  • 4
    Related: [How to edit 300 GB text file (genomics data)?](http://stackoverflow.com/q/16900721/1983854). If you know the line numbers, it is quite straight forward. – fedorqui Oct 07 '14 at 11:46
  • I know you can use libc/syscalls/php/ansi c to create a file descriptor, set its cursor to a specific address and only write those bytes. It's called `fseek()`. – Daniel W. Oct 07 '14 at 11:48
  • 1
    It's straightforward to do with awk, sed, ed, ... Can you describe exactly what edits you need to make? – glenn jackman Oct 07 '14 at 12:44
  • I need to add some ending tags at the end of file [e.g.

    ]

    – user3635159 Oct 07 '14 at 13:27
  • 1
    On modern hardware, opening a 120MB file in Vim shouldn't be a problem. I use `set noswapfile` and `set nobackup` in my .vimrc file and have no problem opening large files. Once the file is open, use `G` to jump to the end, and start editing. – Richard Neish Oct 07 '14 at 13:33
  • Thanks all for your replies. – user3635159 Oct 07 '14 at 17:53
  • If you're literally only appending to the very end, that's even easier than editing lines near the end in-place: `echo "

    " >>file`

    – Charles Duffy Oct 08 '14 at 03:57
  • I find so far Charles' reply most attractive. Can you also combine the 'sed' command with 'echo' - for example to replace

    with ?

    – user3635159 Oct 08 '14 at 07:15
  • @user3635159, no need for echo. If you have GNU sed, for instance: `sed -i "s@

    @@"`. That said, that'll change it everywhere in the document; if you know the line number for the single line where it needs changing, though, that can be passed to `sed` too. To only edit the area between the 15,000th line and the end of file, for instance: `sed -i '15000,$s@@@'`. If you have a `sed` without GNU extensions, the *standard* (and actually better) tool for this kind of in-place edit is `ex`.

    – Charles Duffy Oct 08 '14 at 13:22
  • ...also, you can test `ex` commands in vim interactively, since its commands are the same thing as `vi`'s command-mode operations. (There's actually a good chance that your system's `ex` binary is provided by the `vim` package; I only know of a few Linux distros where it's not). – Charles Duffy Oct 08 '14 at 13:23

1 Answers1

0

Chopping then Catting

You could always use split with any of its various options to make smaller files, then edit the last file, and concatenate them back together with cat when you're done.

Or use csplit, to chop off only the end piece.

A Better Way

As Charles Duffy points out, it would be more efficient to use dd to seek near the end of the file, read the contents and edit them, then use dd again to tack it back on.

This is especially important when dealing with really huge files.

Changing the editor

With Vim you can also do set noswapfile, set nobackup and syntax off to reduce the load of the editor on your system.

Or you could try vim -u "NONE" file.csv to use Vim without any plugins.

Note: The first two methods assume that you don't know exactly (i.e. line number or pattern) where you want to edit yet.

Community
  • 1
  • 1
Travis
  • 2,579
  • 18
  • 19
  • The split/concat approach is needlessly inefficient, compared to editing in-place; if this is done correctly, there should be no need to rewrite anything prior to the content being edited. – Charles Duffy Oct 08 '14 at 03:59
  • It could be a fix depending on how large the file is, but you're right. – Travis Oct 08 '14 at 04:01
  • ...using `dd` to seek to 2k from the end, read off the content there, modify it in-place, and then dd it back onto the end is perhaps a more reasonable alternative to `split` -- that way you're only rewriting one block at the end of the file, leaving the rest completely in-place. Even then, though, that's still more work than having `ex` or another UNIX tool go straight to the line and modify it for you. – Charles Duffy Oct 08 '14 at 04:01
  • Sure, or even `csplit` into two then `cat` again would be better than `split` proper. – Travis Oct 08 '14 at 04:17
  • Anything that involves `cat` is defeating the point, because it means you're rewriting parts of the file that don't need to change. Pretend it was a 100GB file instead of a 200MB one, and the point is magnified; if you're doing it right, you're modifying only the last block, and not rewriting anything else or consuming space for temporary files beyond your block size (a matter of a few kb). – Charles Duffy Oct 08 '14 at 04:18
  • ...so, if you tell `dd` to copy out everything after 500 bytes before the file's end, edit that 500-byte file, and then tell `dd` to put the edited version back in place, then you've only modified a single 2k (or 4k, or whatever your filesystem was formatted with) block. Very fast, no temp space usage beyond a single block size. Use some `split` variant to create 200mb of temporary files, and recombine them to rewrite the 200mb target file, and you've done 400mb of IO, instead of 8kb of IO max. – Charles Duffy Oct 08 '14 at 04:22