55

I have to do some find and replace tasks on a rather big file , about 47 GB in size .

Does anybody know how to do this ? I tried using services like TextCrawler , EditpadLite and more but nothing supports this large a file .

I'm assuming this can be done via the commandline .

Do you have an idea how this can be accomplished ?

Shrayas
  • 6,784
  • 11
  • 37
  • 54

7 Answers7

62

Sed (stream editor for filtering and transforming text) is your friend.

sed -i 's/old text/new text/g' file

Sed performs text transformations in a single pass.

Ryan
  • 26,884
  • 9
  • 56
  • 83
  • 1
    What's the size limit of a file it can address? Has it any limits according the architecture (32/64-bit)? – sarat Aug 05 '11 at 05:20
  • 1
    Take a look at http://sed.sourceforge.net/sedfaq6.html -> looks like there is no limit to be concerned about. – Ryan Aug 05 '11 at 14:17
  • Looks good mate , but i run Windows seven . SED is a unix util, no? – Shrayas Aug 05 '11 at 18:42
  • 4
    Traditionally, yes, but it's an open source command line tool, available on most platforms. A quick google points to http://gnuwin32.sourceforge.net/packages/sed.htm. Might need some footwork to get it going but it could work for you. – Ryan Aug 05 '11 at 19:14
  • Here is another list of references which may help you: http://stackoverflow.com/questions/127318/is-there-any-sed-like-utility-for-cmd-exe – Ryan Aug 05 '11 at 19:16
  • Awesomeness. Thanks a ton @RyanBates ! Will surely try this out :) – Shrayas Aug 08 '11 at 03:57
  • This is an old post, but perhaps that would still help some Windows users.... you can simply buy a huge computing instance at amazon and run the sed tool for that - it's really cheap and will help you to deal with the file. – Razique Mar 15 '16 at 02:49
  • get cygwin... it also contains windows ports for other apps, you will need to at it to path – Palcente Jul 22 '16 at 11:59
  • As long as your script doesn't explicitly collect stuff into memory (which a beginner would not know how to do anyway), `sed` processes a single line at a time, so file size is not an issue *per se.* Really long input lines could be, but few files have lines that are gigabytes or terabytes long. – tripleee Mar 10 '17 at 15:19
  • Re: sed, see accepted answer here, https://stackoverflow.com/questions/11145270/how-to-replace-an-entire-line-in-a-text-file-by-line-number >> sed streams the entire file, but as noted in this answer, specifying the line number (if known) helps: in my case, a ~2-fold increase in execution speed (GNU sed 4.5). You can grep -n or ripgrep (rg) to find line numbers, based on pattern searches – Victoria Stuart Apr 24 '18 at 19:02
  • I've tried multiple different solutions for my 1.5GB file (so still quite small) - notepad, notepad++, below mentioned FART, but not even FART gave me a reliable result - had to run it multiple times and it still didn't replace everything. WTF. Then I tried linux subsystem on W10 and ran "sed". Finally, fast and reliable. Thanks. – podvlada Dec 09 '20 at 09:45
49

I use FART - Find And Replace Text by Lionello Lunesu.

It works very well on Windows Seven x64.

You can find and replace the text using this command:

fart -c big_filename.txt "find_this_text" "replace_to_this"

github

Kokizzu
  • 24,974
  • 37
  • 137
  • 233
JorgeKlemm
  • 499
  • 4
  • 3
10

On Unix or Mac:

sed 's/oldstring/newstring/g' oldfile.txt > newfile.txt

fast and easy...

1

I solved the problem usig, before, split to reduce the large file in smalls with 100 MB each.

  • 3
    Hi, welcome to Stack Overflow! This answer is a bit unclear; could you give the actual commands or exact steps that you used? – Ryan M Oct 29 '20 at 00:24
  • In BASH: split -d -b 100000000 foo After subistitute with "sed" and checking the init and the ending of the result files with "foo" prefix, if it contais the string to substitute, manuality subistitute if cuted, resample the big file with the command: for FILE in `ls -1 foo*`; do cat "${FILE}" >> ; done – Antonio Vandré P F Gomes Oct 30 '20 at 15:53
-1

If you are using a Unix like system then you can use cat | sed to do this

cat hosted_domains.txt | sed s/com/net/g

Example replaces com with net in a list of domain names and then you can pipe the output to a file.

Devraj
  • 3,025
  • 24
  • 25
  • 10
    You should skip `cat` and write `sed 's/foo/bar/g' FILE` instead. – Zsolt Botykai Aug 05 '11 at 05:13
  • For a beginner question, maybe also explain that `/g` is only necessary if there can be multiple occurrences on a single line. Very frequently, the default behavior -- to only replace the first occurrence -- is exactly what you want, and adding a `/g` accomplishes nothing, except maybe making it a little slower; or, in the worst case, a bug. (And yes, lose the [useless use of `cat`](http://www.iki.fi/era/unix/award.html).) – tripleee Jan 25 '16 at 06:55
  • @tripleee How could you know that only replacing first occurrence per line is what anybody wants? If I want to replace something in a text I usually want to replace all occurrences. Not just the first per line. – The incredible Jan Mar 10 '17 at 14:24
  • If I positively *knew* I would post a separate answer. I'm merely pointing out that this should probably be explained, to help those who need this answer but are unfamiliar with the tool. – tripleee Mar 10 '17 at 15:09
  • Like @ZsoltBotykai said, this is a [UUOC](http://porkmail.org/era/unix/award.html) – Anthony Aug 29 '17 at 14:18
-2

For me none of the tools suggested here work well. Textcrawler ate all my computer's memory, SED didn't work at all, Editpad complained about memory...

The solution is: create your own script in python, perl or even C++.

Or use the tool PowerGrep, this is the easiest and fastest option.

I have't tried fart, it's only command line and maybe not very friendly.
Some hex editor, such as Ultraedit also work well.

skan
  • 7,423
  • 14
  • 59
  • 96
  • Replaces can be done on huge files with UltraEdit by opening it without usage of a temporary file which results also in making the replaces without undo recording or even better with using __Replace in Files__ with *In files/types* being the file name and *Directory* specifying the file's path (as an example) and don't open the file at all in UltraEdit. See in UE forum [How to run fast a Perl regular expression Replace All on a huge file?](https://www.ultraedit.com/forums/viewtopic.php?f=8&t=16401) and [Find and replace HEX in files](https://www.ultraedit.com/forums/viewtopic.php?f=8&t=15990). – Mofi Jul 23 '16 at 18:09
  • I already had the temporary file disabled in Ultraedit. I didn't know the "Replace in Files" option though. – skan Jul 24 '16 at 09:50
  • UltraEdit was a piece of cake for a sql file over 1GB, replacing 200k+ occurrences of a string. – Matthew Blancarte Jan 25 '17 at 19:28
  • 1
    If `sed` "doesn't work", why do you think Perl would work? For the record, `perl -pe 's/foo/bar/g' file >newfile` i.e. exactly like `sed`, only the regex support is more versatile. – tripleee Mar 10 '17 at 15:12
  • 1
    `sed` was written in C over 40 years ago - I *highly* doubt writing your own find and replace python script would be faster. – max kaplan Aug 28 '17 at 18:58
-2

I used

sed 's/[nN]//g' oldfile.fasta > newfile.fasta

to replace all the instances of n's in my 7Gb file.

If I omitted the > newfile.fasta aspect it took ages as it scrolled up the screen showing me every line of the file.

With the > newfile it ran it in a matter of seconds on an ubuntu server

Thomas Fritsch
  • 9,639
  • 33
  • 37
  • 49
Julian
  • 1