1

I have a large file with many columns and rows. I would like to replace an entire string in the first column based on a substring that's common to all strings I want to replace. Here's an example of what I have:

AAA_1765 866 HTG
AAA_1873 987 IGA
AAA_1922 413 BOK

I would like all strings in the first column that contain the substring AAA_1 be entirely replaced with another string, so that it looks like this:

BBB_2 866 HTG
BBB_2 987 IGA
BBB_2 413 BOK

I've been working with sed to do a search/replace:

sed 's/^AAA_1*/BBB_2/' infile.txt >outfile.txt
sed 's/^AAA_1.*/BBB_2/' infile.txt >outfile.txt

But the first use replaces only the substring AAA_1 with BBB_2 and retains the rest of the string (I want the full string to be replaced with BBB_2), and the second use replaces the entire line with BBB_2 (I only want the string in column one replaced).

Maybe awk is what I need? Any suggestions will be helpful.

nrcombs
  • 503
  • 3
  • 17

2 Answers2

0

You may match any 0+ digits after AAA_1 using

sed 's/^AAA_1[0-9]*/BBB_2/' infile.txt > outfile.txt

See the online sed demo.

This regex matches

  • ^ - start of a line -AAA_1 - a literal substring
  • [0-9]* - zero or more digits (if any non-space is meant, you may replace it with [^ ]*)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

A simple awk solution:

awk '/^AAA_1/ { $1 = "BBB_2" } 1' file

BBB_2 866 HTG
BBB_2 987 IGA
BBB_2 413 BOK
anubhava
  • 761,203
  • 64
  • 569
  • 643