Find and replace text in a 47GB large file

Question

I have to do some find and replace tasks on a rather big file , about 47 GB in size .

Does anybody know how to do this ? I tried using services like TextCrawler , EditpadLite and more but nothing supports this large a file .

I'm assuming this can be done via the commandline .

Do you have an idea how this can be accomplished ?

If you're running in 64-bit architecture, the size of the file isn't really a big deal unless some restrictions imposed by the tools — sarat, Aug 07 '11 at 15:09
well if your editor/tool tries to load the file into RAM that is a serious restriction. — My1, Mar 05 '18 at 11:43

score 62 · Accepted Answer · edited Dec 21 '17 at 03:45

62

Sed (stream editor for filtering and transforming text) is your friend.

sed -i 's/old text/new text/g' file

Sed performs text transformations in a single pass.

edited Dec 21 '17 at 03:45

answered Aug 05 '11 at 05:11

Ryan

26,884
9
56
83

1

What's the size limit of a file it can address? Has it any limits according the architecture (32/64-bit)? – sarat Aug 05 '11 at 05:20
1

Take a look at http://sed.sourceforge.net/sedfaq6.html -> looks like there is no limit to be concerned about. – Ryan Aug 05 '11 at 14:17
Looks good mate , but i run Windows seven . SED is a unix util, no? – Shrayas Aug 05 '11 at 18:42
4

Traditionally, yes, but it's an open source command line tool, available on most platforms. A quick google points to http://gnuwin32.sourceforge.net/packages/sed.htm. Might need some footwork to get it going but it could work for you. – Ryan Aug 05 '11 at 19:14
Here is another list of references which may help you: http://stackoverflow.com/questions/127318/is-there-any-sed-like-utility-for-cmd-exe – Ryan Aug 05 '11 at 19:16
Awesomeness. Thanks a ton @RyanBates ! Will surely try this out :) – Shrayas Aug 08 '11 at 03:57
This is an old post, but perhaps that would still help some Windows users.... you can simply buy a huge computing instance at amazon and run the sed tool for that - it's really cheap and will help you to deal with the file. – Razique Mar 15 '16 at 02:49
get cygwin... it also contains windows ports for other apps, you will need to at it to path – Palcente Jul 22 '16 at 11:59
As long as your script doesn't explicitly collect stuff into memory (which a beginner would not know how to do anyway), `sed` processes a single line at a time, so file size is not an issue *per se.* Really long input lines could be, but few files have lines that are gigabytes or terabytes long. – tripleee Mar 10 '17 at 15:19
Re: sed, see accepted answer here, https://stackoverflow.com/questions/11145270/how-to-replace-an-entire-line-in-a-text-file-by-line-number >> sed streams the entire file, but as noted in this answer, specifying the line number (if known) helps: in my case, a ~2-fold increase in execution speed (GNU sed 4.5). You can grep -n or ripgrep (rg) to find line numbers, based on pattern searches – Victoria Stuart Apr 24 '18 at 19:02
I've tried multiple different solutions for my 1.5GB file (so still quite small) - notepad, notepad++, below mentioned FART, but not even FART gave me a reliable result - had to run it multiple times and it still didn't replace everything. WTF. Then I tried linux subsystem on W10 and ran "sed". Finally, fast and reliable. Thanks. – podvlada Dec 09 '20 at 09:45

score 49 · Answer 2 · edited Feb 05 '21 at 14:10

49

I use FART - Find And Replace Text by Lionello Lunesu.

It works very well on Windows Seven x64.

You can find and replace the text using this command:

fart -c big_filename.txt "find_this_text" "replace_to_this"

github

edited Feb 05 '21 at 14:10

Kokizzu

24,974
37
137
233

answered Jun 04 '14 at 14:35

JorgeKlemm

499
4
3

6

That website is amazing – cowsay Sep 22 '16 at 15:43
4

Downloaded without thinking twice. I mean, who doesn't install something with a name as glorious as fart. – Rushat Rai Sep 03 '17 at 16:39
1

Used this to replace text in a 5gb SQL file, completed the task in less than 4 seconds. – Deepak Kamat Jul 03 '18 at 17:45
Used this to replace text in a 1.7gb SQL file, completed the task around 10 seconds. – Maxim Mandrik Sep 03 '18 at 20:21
is anyone able to tell me how to use this utility to remove any double quote from a .csv file? I've tried hard but there's no way to do it! – Power Engineering Feb 15 '19 at 18:03
1

fart HUGEFILE.sql "\"" "" --remove – JAGJ jdfoxito May 07 '19 at 17:19
"Sniiifff" Aaahhh... smells like Channel No. 5! Thanks for sharing this little and very useful tool. – Metafaniel Oct 29 '21 at 23:13

score 10 · Answer 3 · answered Feb 09 '17 at 09:35

10

On Unix or Mac:

sed 's/oldstring/newstring/g' oldfile.txt > newfile.txt

fast and easy...

answered Feb 09 '17 at 09:35

Ignacio Carvajal

189
1
5

Downvote: This duplicates an existing answer from 2011. – tripleee Mar 10 '17 at 15:10
@tripleee might be. but he only has given the command to run sed. so +1 – jafarbtech Jun 15 '17 at 04:46
I don't think duplicating a 47GB file (by writing to a new location) is a very wise idea. – Bruno Philipe Jan 18 '18 at 21:31

score 1 · Answer 4 · answered Oct 28 '20 at 14:27

1

I solved the problem usig, before, split to reduce the large file in smalls with 100 MB each.

answered Oct 28 '20 at 14:27

Antonio Vandré P F Gomes

11
2

3

Hi, welcome to Stack Overflow! This answer is a bit unclear; could you give the actual commands or exact steps that you used? – Ryan M Oct 29 '20 at 00:24
In BASH: split -d -b 100000000 foo After subistitute with "sed" and checking the init and the ending of the result files with "foo" prefix, if it contais the string to substitute, manuality subistitute if cuted, resample the big file with the command: for FILE in `ls -1 foo*`; do cat "${FILE}" >> ; done – Antonio Vandré P F Gomes Oct 30 '20 at 15:53

score -1 · Answer 5 · answered Aug 05 '11 at 05:11

-1

If you are using a Unix like system then you can use cat | sed to do this

cat hosted_domains.txt | sed s/com/net/g

Example replaces com with net in a list of domain names and then you can pipe the output to a file.

answered Aug 05 '11 at 05:11

Devraj

3,025
24
25

10

You should skip `cat` and write `sed 's/foo/bar/g' FILE` instead. – Zsolt Botykai Aug 05 '11 at 05:13
For a beginner question, maybe also explain that `/g` is only necessary if there can be multiple occurrences on a single line. Very frequently, the default behavior -- to only replace the first occurrence -- is exactly what you want, and adding a `/g` accomplishes nothing, except maybe making it a little slower; or, in the worst case, a bug. (And yes, lose the [useless use of `cat`](http://www.iki.fi/era/unix/award.html).) – tripleee Jan 25 '16 at 06:55
@tripleee How could you know that only replacing first occurrence per line is what anybody wants? If I want to replace something in a text I usually want to replace all occurrences. Not just the first per line. – The incredible Jan Mar 10 '17 at 14:24
If I positively *knew* I would post a separate answer. I'm merely pointing out that this should probably be explained, to help those who need this answer but are unfamiliar with the tool. – tripleee Mar 10 '17 at 15:09
Like @ZsoltBotykai said, this is a [UUOC](http://porkmail.org/era/unix/award.html) – Anthony Aug 29 '17 at 14:18

skan · Answer 6 · 2016-07-24T09:49:40.440

-2

For me none of the tools suggested here work well. Textcrawler ate all my computer's memory, SED didn't work at all, Editpad complained about memory...

The solution is: create your own script in python, perl or even C++.

Or use the tool PowerGrep, this is the easiest and fastest option.

I have't tried fart, it's only command line and maybe not very friendly.
Some hex editor, such as Ultraedit also work well.

edited Jul 24 '16 at 09:49

answered Jul 22 '16 at 11:54

skan

7,423
14
59
96

Replaces can be done on huge files with UltraEdit by opening it without usage of a temporary file which results also in making the replaces without undo recording or even better with using __Replace in Files__ with *In files/types* being the file name and *Directory* specifying the file's path (as an example) and don't open the file at all in UltraEdit. See in UE forum [How to run fast a Perl regular expression Replace All on a huge file?](https://www.ultraedit.com/forums/viewtopic.php?f=8&t=16401) and [Find and replace HEX in files](https://www.ultraedit.com/forums/viewtopic.php?f=8&t=15990). – Mofi Jul 23 '16 at 18:09
I already had the temporary file disabled in Ultraedit. I didn't know the "Replace in Files" option though. – skan Jul 24 '16 at 09:50
UltraEdit was a piece of cake for a sql file over 1GB, replacing 200k+ occurrences of a string. – Matthew Blancarte Jan 25 '17 at 19:28
1

If `sed` "doesn't work", why do you think Perl would work? For the record, `perl -pe 's/foo/bar/g' file >newfile` i.e. exactly like `sed`, only the regex support is more versatile. – tripleee Mar 10 '17 at 15:12
1

`sed` was written in C over 40 years ago - I *highly* doubt writing your own find and replace python script would be faster. – max kaplan Aug 28 '17 at 18:58

score -2 · Answer 7 · edited Jan 18 '18 at 21:24

-2

I used

sed 's/[nN]//g' oldfile.fasta > newfile.fasta

to replace all the instances of n's in my 7Gb file.

If I omitted the > newfile.fasta aspect it took ages as it scrolled up the screen showing me every line of the file.

With the > newfile it ran it in a matter of seconds on an ubuntu server

edited Jan 18 '18 at 21:24

Thomas Fritsch

9,639
33
37
49

answered Jan 18 '18 at 19:26

Julian

1

Find and replace text in a 47GB large file

7 Answers7

Linked