1

I want to run a find and replace using a series of value pairs taken from a file (or two files, if that makes the task any easier). The find and replace strings are literal ones, not regexes in the practical sense. At the moment the file is tab-delimited, findstring \t replacestring, one pair per line, but I can change that as required.

I know a little about regex but with Unix commands I really need clear "copy and paste" instructions. Earlier in this project I was pleased to discover grep -f to get find strings from a file, but it seems that grep can't do the same thing for the replace strings.

Can I do this with a mixture of grep, sed and so on? The thread above explains how to pipe grep to sed, but then I need to tell sed how to read replace strings from the file.

I'm on macOS (with homebrew) if that makes a difference.

1 Answers1

2

You can make a file with a list of sed commands like this in a file called commands.sed:

s|cat|cats|g
s|dog|dogs|g
s|person|people|g

and run it on some input with:

echo "House mouse cat dog person dog person" | sed -f commands.sed

and it will replace cat with cats, dog with dogs and person with people producing:

House mouse cats dogs people dogs people

So we want to turn your file with substitutions into a command file like that - using sed! So, if your replacements file subs.txt contains lines like this with the two words on each line separated by a TAB:

cat cats
dog dogs
person  people

That would be:

sed -e 's/^/s|/' -e $'s/\t/|/' -e 's/$/|g/' subs.txt > commands.sed

and then you can apply it with:

sed -f commands.sed SomeFile

Rather than creating a file with the commands in, we can run a process substitution like this to dynamically generate them, and do it all in one go with:

echo "House mouse cat dog person dog person" | sed -f <(sed -e 's/^/s|/' -e $'s/\t/|/' -e 's/$/|g/' subs.txt)
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • @CharlesButcher :. this is a good solution, but note what happens once your substitute `dogs` for `dog`. Now `dog` will match `dogs` and if you run the script again, you'll wind up with `dogss` ;-) The moral being, always work on copies of your data. If you discover a mistake, rerunning a program can compound the problem ;-! Good luck to all. – shellter Jan 25 '20 at 14:35
  • 1
    Note that if you have tens of thousands of files to process, you can generate the command file just once and apply it to all files in parallel with **GNU Parallel** - just ask. – Mark Setchell Jan 25 '20 at 14:41
  • @MarkSetchell Thank you so much for such a careful, patient and accurate walkthrough. It worked first try – well, second, but that was because my `subs.txt` had an unwanted space before the tab. A great result, and I've learned a lot for the future. – Charles Butcher Jan 25 '20 at 19:42
  • @shellter A good point, thank you! I have messed up so many times with search-and-replace in word processors that I've learned to be careful… – Charles Butcher Jan 25 '20 at 19:43