2

I have files where missing data is inserted as '+'. So lines look like this:

substring1+++++substring2++++++++++++++substring3+substring4

I wanna replace all repetitions of '+' >5 with 'MISSING'. This makes it more readable for my team and makes it easier to see the difference between missing data and data entered as '+' (up to 5 is allowed). So far I have:

while read l; do
  echo "${l//['([+])\1{5}']/'MISSING'}"
done < /path/file.txt

but this replaces every '+' with 'MISSING'. I need it to say 'MISSING' just once.

Thanks in advance.

Niwatori
  • 23
  • 3
  • 2
    You can't use regex in Bash variable expansion. Use `sed 's/+\{1,\}/MISSING/g' <<< "$l"`. Or just without reading lines, `sed 's/+\{1,\}/MISSING/g' /path/file.txt` – Wiktor Stribiżew May 05 '20 at 08:18
  • 1
    Hi there! Nothing to add to the answer you select, it's perfect. Anyway, I think this [answer](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) could be useful for you. – Francesco May 05 '20 at 09:30

1 Answers1

4

You can't use regex in Bash variable expansion.

In your loop, you may use

sed 's/+\{1,\}/MISSING/g' <<< "$l"

Or, you may use sed directly on the file

sed 's/+\{1,\}/MISSING/g' /path/file.txt

The +\{1,\} POSIX BRE pattern matches a literal + (+) 1 or more times (\{1,\}).

See the sed demo online

sed 's/+\{1,\}/MISSING/g' <<< "substring1+++++substring2++++++++++++++substring3+substring4"
# => substring1MISSINGsubstring2MISSINGsubstring3MISSINGsubstring4

If you need to make changes to the same file use any technique described at sed edit file in place.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563