substitute text with equal length using sed

Question

Is there a way to replace a pattern with equal length of somethings else (e.g. dots, zeros etc.) using sed? Like this:

maci:/ san$ echo "She sells sea shells by the sea shore" | sed 's/\(sh[a-z]*\)/../gI'
.. sells sea .. by the sea ..

("I" requires a newer version of sed to ignore case)
This was easy: the word that starts with "sh" is replaced by double dots (..) but how do I make it something like this: ... sells sea ...... by the sea .....

Any idea? Cheers!

Why do you care which standard unix tool(s) can do this? This is very clearly a trivial job for awk but why even bring up using sed or any other tool? — Ed Morton, Apr 15 '13 at 02:38
do you mean you need sed-only solution, and without any external commands/program supports? — Kent, Apr 15 '13 at 10:34
Well, it doesn't have to be "using sed" but that's what I wanted to do for better visibility within our group. Cheers!! — MacUsers, Apr 16 '13 at 08:06

score 8 · Answer 1 · answered Apr 14 '13 at 23:50

8

My suspicion is that you can't do it in standard sed, but you could do it with Perl or something else with more powerful regex handling.

$ echo "She sells sea shells by the sea shore" |
> perl -pe 's/(sh[a-z]*)/"." x length($1)/gei'
... sells sea ...... by the sea .....
$

The e modifier means that the replacement pattern is executable Perl script; in this case, it repeats the character . as many times as there are characters in the matched pattern. The g modifier repeats across the line; the i modifier is for case-insensitive matching. The -p option to Perl prints each line after the processing in the script specified by the -e option — the substitute command.

answered Apr 14 '13 at 23:50

Jonathan Leffler

730,956
141
904
1,278

Friend of mine showed me the exact same way you did with Perl. As you said, it's probably not possible with sed at all. Cheers!! – MacUsers Apr 15 '13 at 00:37
What does `"." x length($1)` mean? – Ed Morton Apr 15 '13 at 02:52
1

The `x` operator repeats the string on the LHS as many times as the number on the RHS. Thus, `"0123" x 1024` generates a string of 1024 repeats of '`0123`'. In this context, the value `length($1)` is the number of characters in the string captured by the `(...)` parentheses in the search regex, so `"." x length($1)` generates one dot for each character in the word matched by `(sh[a-z]*)`. The regex could be tightened so it doesn't match `mashed` (as it stands, you'd get `ma....` out of it). The `\b` (word boundary) regex term before and after would resolve that problem. – Jonathan Leffler Apr 15 '13 at 02:56
Thanks for the explanation. So the letter "x" is an operator? Bit of a surprise - I wonder why the perl powers that be chose to use a character instead of "*" or even a function to make it clearer.... – Ed Morton Apr 15 '13 at 18:57

score 6 · Answer 2 · answered Jan 04 '14 at 17:09

An old question, but I found a nice and reletively short one line sed solution:

sed ':a;s/\([Ss]h\.*\)[^\. ]/\1./;ta;s/[Ss]h/../g'

Works by replacing one character at a time in a loop.

:a; start a loop

s/$[Ss]h\.*$[^\. ] search for an sh followed by any number of .s (our completed work so far) followed by a non dot or space character (what we're going to replace)

/\1./; replace it by our completed work so far plus another ..

ta; if we made any substitution, loop, otherwise...

s/[Ss]h/../g replace the shs with two .s and call it a day.

Excellent Martin!! Very short and lot easier to understand. Extra thanks for the explanation. — MacUsers, Jan 04 '14 at 22:46

score 5 · Answer 3 · answered Apr 14 '13 at 23:44

5

does this awk-oneliner do the job for you?

awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1' file

test with your data:

kent$  echo "She sells sea shells by the sea shore"|awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1'
... sells sea ...... by the sea .....

answered Apr 14 '13 at 23:44

Kent

189,393
32
233
301

thanks for 'awk' bit; just wondering if it's possible with `sed` at all. Cheers!! – MacUsers Apr 15 '13 at 00:12

score 4 · Answer 4 · answered Apr 15 '13 at 02:49

$ echo "She sells sea shells by the sea shore" |
awk '{
   head = ""
   tail = $0
   while ( match(tolower(tail),/sh[a-z]*/) ) {
      dots = sprintf("%*s",RLENGTH,"")
      gsub(/ /,".",dots)
      head = head substr(tail,1,RSTART-1) dots
      tail = substr(tail,RSTART+RLENGTH)
   }
   print head tail
}'
... sells sea ...... by the sea .....

score 3 · Answer 5 · edited Jun 20 '20 at 09:12

3

As noted by others, sed is not well suited for this task. It is of course possible, here's one example that works on single lines with space separated words:

echo "She sells sea shells by the sea shore" |

sed 's/ /\n/g' | sed '/^[Ss]h/ s/[^[:punct:]]/./g' | sed ':a;N;$!ba;s/\n/ /g'

Output:

... sells sea ...... by the sea .....

The first 'sed' replaces spaces by newlines, the second does the dotting, the third removes newlines as shown in this answer.

If you have unpredictable word separators and/or paragraphs, this approach soon becomes unmanageable.

Edit - multi-line alternatives

Here's one way to handle multi-line input, inspired by Kent's comments (GNU sed):

echo "
She sells sea shells by the sea shore She sells sea shells by the sea shore,
She sells sea shells by the sea shore She sells sea shells by the sea shore
 She sells sea shells by the sea shore She sells sea shells by the sea shore
" |

# Add a \0 to the end of the line and surround punctuations and whitespace by \n 
sed 's/$/\x00/; s/[[:punct:][:space:]]/\n&\n/g' |

# Replace the matched word by dots
sed '/^[Ss]h.*/ s/[^\x00]/./g' | 

# Join lines that were separated by the first sed
sed ':a;/\x00/!{N;ba}; s/\n//g'

Output:

... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....,
... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....
 ... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 15 '13 at 11:11

Thor

45,082
11
119
130

this will only work on single line put. because after the 1st sed, you cannot distinguish the `\n` were from you or in original input. – Kent Apr 15 '13 at 11:54
this will support multi-lines input. `sed -r 's/(^| )/\n\x98/g' file|sed '/^\x98[Ss]h/ s/././g'|sed -n '1h;1!H;${x;s/\n\x98/ /g;p}'` still the 3 seds approach – Kent Apr 15 '13 at 12:06
that was the idea I got in my lunch break... but after lunch I saw your answer, so I better not create a new answer. you can reference in your answer if you think it helps. – Kent Apr 15 '13 at 12:09
I see what you mean, for the line-beginning there was no space, but we added one. Later we must identify this very position. it could be solved by adding another char ,e.g. `\x99`, and yes, there would be two more `s/../../` statements. But it is the idea to try to make it support multilines. – Kent Apr 15 '13 at 12:44
OK.. sorry If my comments made you unhappy, I don't mean it. I just saw you got a similar idea as mine except for the multi-lines parts. well, maybe I should not post those comments at all.... +1 your answer: 3 sed lines but easy to read. – Kent Apr 15 '13 at 13:03
@Thor,@Kent: my fault, w.r.t. multi-line confusion. Even though I didn't explicitly mention about the multi-line thing, that was actually my goal - running this on files in a directory. Thanks to both of you. Cheers!! – MacUsers Apr 16 '13 at 08:10
Excellent explanation! Took much from it! Thank you!! – mark_infinite Jan 28 '21 at 23:41

potong · Accepted Answer · 2021-02-03T11:05:09.770

3

This might work for you (GNU sed):

sed -r ':a;/\b[Ss]h\S+/!b;s//\n&\n/;h;s/.*\n(.*)\n.*/\1/;s/././g;G;s/(.*)\n(.*)\n.*\n/\2\1/;ta' file

In essence; it copies a word beginning with sh or Sh, replaces each character with . and then re-inserts the new string back into the original. When all occurences of the search string have been exhausted it prints out the line.

An alternative:

sed -E 's/\S+/\n&/g;s#.*#echo "&"|sed "/^sh/Is/\\S/./g"#e;s/\n//g' file

edited Feb 03 '21 at 11:05

answered Apr 15 '13 at 12:03

potong

55,640
6
51
83

+1 a nice sed - one liner. .. I can learn things from this one. – Kent Apr 15 '13 at 13:05
1

@potong: A bit long and towards the "getting out of control" edge but that's probably the only way of doing it using sed. As I said `sed` in my OP, I'll accept this but probably I'll use 'awk' as Kent suggested. Cheers!! – MacUsers Apr 16 '13 at 08:14

substitute text with equal length using sed

6 Answers6

Edit - multi-line alternatives

Linked