0

I have been trying to find how to do a batch find and replace in Terminal on Mac OS X for more than the past hour. I found different versions of code, but am having difficulty making it work. So far, I have found one string of code that works, but it only works for one term/character.

What I want to do is find and replace multiple characters in one text file, all at the same time.

For example:

Find §, replace with ก
Find Ø, replace with ด
Find ≠, replace with ห
Find £, replace with ้

The code that works so far is (but only for one character):

sed -i '' s/Ø/ด/ [textfile.txt]

Could anyone please help me out?

SomethingDark
  • 13,229
  • 5
  • 50
  • 55
samseva
  • 103
  • 4
  • First line of example I read replace a "paragraph sign" with "something that's not clear", possibly from a non-Latin alphabet. The paragraph sign is what you meant or there is a problem with the encoding of the characters? – gboffi Apr 24 '16 at 15:05
  • It is due to non-Latin characters. The find ones I think are Latin, but not the replace ones. – samseva Apr 24 '16 at 15:19
  • Yes, the paragraph sign is what I meant (as well as the "not equal" and pound symbols). – samseva Apr 24 '16 at 15:21

1 Answers1

0

Your pattern of usage is so common that there is a specific utility you can use for it, namely tr

tr abc ABC < input.txt > output.txt

where you use two strings (here abc and ABC) to instruct tr on the substitutions you want (here, substitute a with A, b with B etc).


With sed, that's MUCH more general in its usage with respect to tr, to search and replace the first occurrence in every line it is

sed 's/src1/rep1/' < in > out

to search and replace every occurrence in every line you add a g switch to the s command

sed 's/src1/rep1/g' < in > out

eventually to do multiple search and replaces you must separate the s commands with a semicolon

sed 's/src1/rep1/g;s/src2/rep2/;s/src3/rep3/g' < in > out

Note that in the above example I used the g switch (line-wise global substitution) for the 1st and the 3rd find&replace and not for the 2nd one... your usage may be different but I hope that you've spotted the pattern, haven't you?

gboffi
  • 22,939
  • 8
  • 54
  • 85
  • I tried the code and it works. However, would you know what code to add so that you can find and search multiple words/characters all at once? – samseva Apr 24 '16 at 15:16
  • Tried the code. I'll present all the information to make sure everything is okay. 1. I created a txt file called "TestBefore.txt" with the text "¢‡≠§ØÄɧæõ/ɇΩ" in it. 2. Entered the following code in Terminal (removed user): `sed 's/¢/ก/g;s/§/ด/;s/Ä/า/g' > /Users/[...]/Desktop/TestBefore.txt > /Users/[...]/Desktop/TestAfter.txt`. 3. This created a new "TestAfter.txt" file, but with no text in it. This also removed all the text in the initial "TestBefore.txt" file. How come this is happening? – samseva Apr 24 '16 at 16:36
  • My mistake, in 1st and 2nd example I've used ` < in > out` but in the 3rd example (the one you copied) I mistyped `> in > out` and that;s what you've copied in your test, with the two "greater" sign. The "minor" sign stands for "take input from the following file", while the "greater" is for "put output in the following file" so what we did is wrong,,, try `< /Users/[...]/Desktop/TestBefore.txt > /Users/[...]/Desktop/TestAfter.txt` --- please note that I'm going to edit my answer to correct the mistake. – gboffi Apr 24 '16 at 21:10
  • It works now, but only for two replace characters. I need to use this code to batch find and replace many characters—over 20. I tried to simply copy and paste `;s/src3/rep3/g` so that the code looks like `sed 's/src1/rep1/g;s/src2/rep2/;s/src3/rep3/g;s/src3/rep3/g' < in > out`, but Terminal gives the following errors: `sed: 1: "’s/¢/ก/g": invalid command code ? -bash: s/¿/ด/: No such file or directory -bash: s/ƒ/ผ/g: No such file or directory -bash: s/Ä/ฟ/g’: No such file or directory` – samseva Apr 24 '16 at 22:40
  • Also, why is the `g` left out for some instances of the find and repeat and added for others (`;s/src3/rep3/g` vs. `;s/src3/rep3/`)? Maybe this is why it isn't working for more than two replace characters? – samseva Apr 24 '16 at 22:42
  • Could you spot the difference between `'`, the character that I've used to quote the commands, and the character `’` that you have used in your terminal command? It is not the same... and it is a detail that makes all the difference because `'` is recognized by the command interpreter as introducing a string that is not to be interpreted but passed literally to the `sed` command. – gboffi Apr 25 '16 at 06:59
  • Ah, yes. For some reason, it keeps changing to `’`, even when just copying and pasting in the txt file. The code almost completely works now. I tried it with a different line of characters and all of them didn't convert. The input text was `–Å¿“úó¡Ü¿ô—¶‰®` and the code was `sed 's/–/เ/g;s/Å/ข/;s/¿/า/g;s/“/ไ/g;s/ú/ป/g;s/ó/ท/g;s/¡/ำ/g;s/Ü/ง/g;s/ô/น/g;s/—/แ/g;‌​s/¶/ล/g;s/‰/ ้/g;s/®/ว/g' < /Users/[...]/Desktop/Code/CodeTestInput.txt > /Users/[...]/Desktop/Output/CodeTestOutput.txt`. The output was `เÅาไúóำÜาôแล ้ว` (Å, ú, ó, Ü, and ô didn't convert). Does this maybe have to do with the `g`? – samseva Apr 25 '16 at 13:41
  • Hi gboffi, would you know why the above is happening? – samseva Apr 30 '16 at 12:06
  • IMO, for your use case (translate single characters, each of them) the right tool is `tr` rather than `sed`. Further, I'm under the impression that you want to change the encoding of a file from something (related to Thai?) to Unicode, and for such a task there are very specialized and easy to use tools. That said, no, I haven't the faintest of what's going on... I tried to reproduce your example (not on mac os) but everything is OK for me. – gboffi Apr 30 '16 at 21:42
  • Yes, that is exactly what I want to do. I'm trying to convert incorrectly rendered Thai script (gibberish characters I've mentioned previously) back into Thai script. Regarding software, after more than one hour of searching, I found a $5 application, but I think depending on a piece of software (crashes, updates, bugs, cost, etc.), when you can instead use a single line of code isn't as good. – samseva May 01 '16 at 14:57
  • Aha, it works! Did many more tests and everything works 100%. Thank you very much, gboffi. You mentioned in your comment that `tr` would be better than `sed`. Would it be better for what I want to do? Also, I've be using the `​‌​s/¶/ล/g` and `s/ô/น/` (without the g) completely arbitrarily, copying and pasting whichever happens to be under my mouse pointer. What is the difference between the `s/ô/น/g` and `s/ô/น/` (without the g) and which should be used? – samseva May 01 '16 at 15:01
  • For anyone interested, here is the correct code: `sed 's/ä/ช/g;s/‡/่/;s/®/ว/g;s/£/ย/g;s/õ/บ/g;s/Ø/อ/g;s/Ä/ก/g;s/É/ค/g;s/À/ุ/g;s/ì/ณ/;s/ï/ต/g;s/¬/ิ/g;s/ ̋/๋/g;s/¢/ม/g;s/”/ใ/g;s/≠/ห/g;s/‰/้/g;s/‘/โ/g;s/ó/ท/g;s/§/ร/g;s/ñ/ถ/g;s/∆/ึ/g;s/Ü/ง/g;s/â/ฉ/g;s/æ/ั/g;s/ô/น/g;s/î/ด/g;s/Ω/ะ/g' < /Users/[...]/Desktop/Code/CodeTestInput.txt > /Users/[...]/Desktop/Output/CodeTestOutput.txt` (obviously with different file paths and characters) – samseva May 01 '16 at 15:02
  • Referring yo your second to last comment, an `s/a/b/` command changes **only** the 1st instance of `a` (in every line of input) into `b`, while `s/a/b/g` changes all instances of `a` into `b`'s (`g` is a _command modifier_ that stands for **g**lobally). I think `tr` is more appropriate because it's more compact, eg `tr abc nop` vs `sed 's/a/n/g;s/b/o/g;s/c/p/g`— Utilities tailored to what you really want to do are `iconv`, `recode` and `enca`, you should have at least `iconv` in Terminal. Have a look [here](http://stackoverflow.com/q/64860/2749397).. Glad to see that you solved anyway. Ciao – gboffi May 02 '16 at 09:21
  • Thank you for your help, gboffi. – samseva May 02 '16 at 14:25
  • I would give you an up-vote, but it isn't allowing me to do so. :( – samseva May 02 '16 at 14:29