6

Is there a way to replace a pattern with equal length of somethings else (e.g. dots, zeros etc.) using sed? Like this:

maci:/ san$ echo "She sells sea shells by the sea shore" | sed 's/\(sh[a-z]*\)/../gI'
.. sells sea .. by the sea ..

("I" requires a newer version of sed to ignore case)
This was easy: the word that starts with "sh" is replaced by double dots (..) but how do I make it something like this: ... sells sea ...... by the sea .....

Any idea? Cheers!

MacUsers
  • 2,091
  • 3
  • 35
  • 56
  • Why do you care which standard unix tool(s) can do this? This is very clearly a trivial job for awk but why even bring up using sed or any other tool? – Ed Morton Apr 15 '13 at 02:38
  • do you mean you need sed-only solution, and without any external commands/program supports? – Kent Apr 15 '13 at 10:34
  • Well, it doesn't have to be "using sed" but that's what I wanted to do for better visibility within our group. Cheers!! – MacUsers Apr 16 '13 at 08:06

6 Answers6

8

My suspicion is that you can't do it in standard sed, but you could do it with Perl or something else with more powerful regex handling.

$ echo "She sells sea shells by the sea shore" |
> perl -pe 's/(sh[a-z]*)/"." x length($1)/gei'
... sells sea ...... by the sea .....
$

The e modifier means that the replacement pattern is executable Perl script; in this case, it repeats the character . as many times as there are characters in the matched pattern. The g modifier repeats across the line; the i modifier is for case-insensitive matching. The -p option to Perl prints each line after the processing in the script specified by the -e option — the substitute command.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Friend of mine showed me the exact same way you did with Perl. As you said, it's probably not possible with sed at all. Cheers!! – MacUsers Apr 15 '13 at 00:37
  • What does `"." x length($1)` mean? – Ed Morton Apr 15 '13 at 02:52
  • 1
    The `x` operator repeats the string on the LHS as many times as the number on the RHS. Thus, `"0123" x 1024` generates a string of 1024 repeats of '`0123`'. In this context, the value `length($1)` is the number of characters in the string captured by the `(...)` parentheses in the search regex, so `"." x length($1)` generates one dot for each character in the word matched by `(sh[a-z]*)`. The regex could be tightened so it doesn't match `mashed` (as it stands, you'd get `ma....` out of it). The `\b` (word boundary) regex term before and after would resolve that problem. – Jonathan Leffler Apr 15 '13 at 02:56
  • Thanks for the explanation. So the letter "x" is an operator? Bit of a surprise - I wonder why the perl powers that be chose to use a character instead of "*" or even a function to make it clearer.... – Ed Morton Apr 15 '13 at 18:57
6

An old question, but I found a nice and reletively short one line sed solution:

sed ':a;s/\([Ss]h\.*\)[^\. ]/\1./;ta;s/[Ss]h/../g'

Works by replacing one character at a time in a loop.

:a; start a loop

s/\([Ss]h\.*\)[^\. ] search for an sh followed by any number of .s (our completed work so far) followed by a non dot or space character (what we're going to replace)

/\1./; replace it by our completed work so far plus another ..

ta; if we made any substitution, loop, otherwise...

s/[Ss]h/../g replace the shs with two .s and call it a day.

Marty Neal
  • 8,741
  • 3
  • 33
  • 38
5

does this awk-oneliner do the job for you?

awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1' file

test with your data:

kent$  echo "She sells sea shells by the sea shore"|awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1'
... sells sea ...... by the sea .....
Kent
  • 189,393
  • 32
  • 233
  • 301
4
$ echo "She sells sea shells by the sea shore" |
awk '{
   head = ""
   tail = $0
   while ( match(tolower(tail),/sh[a-z]*/) ) {
      dots = sprintf("%*s",RLENGTH,"")
      gsub(/ /,".",dots)
      head = head substr(tail,1,RSTART-1) dots
      tail = substr(tail,RSTART+RLENGTH)
   }
   print head tail
}'
... sells sea ...... by the sea .....
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
3

As noted by others, sed is not well suited for this task. It is of course possible, here's one example that works on single lines with space separated words:

echo "She sells sea shells by the sea shore" |

sed 's/ /\n/g' | sed '/^[Ss]h/ s/[^[:punct:]]/./g' | sed ':a;N;$!ba;s/\n/ /g'

Output:

... sells sea ...... by the sea .....

The first 'sed' replaces spaces by newlines, the second does the dotting, the third removes newlines as shown in this answer.

If you have unpredictable word separators and/or paragraphs, this approach soon becomes unmanageable.

Edit - multi-line alternatives

Here's one way to handle multi-line input, inspired by Kent's comments (GNU sed):

echo "
She sells sea shells by the sea shore She sells sea shells by the sea shore,
She sells sea shells by the sea shore She sells sea shells by the sea shore
 She sells sea shells by the sea shore She sells sea shells by the sea shore
" |

# Add a \0 to the end of the line and surround punctuations and whitespace by \n 
sed 's/$/\x00/; s/[[:punct:][:space:]]/\n&\n/g' |

# Replace the matched word by dots
sed '/^[Ss]h.*/ s/[^\x00]/./g' | 

# Join lines that were separated by the first sed
sed ':a;/\x00/!{N;ba}; s/\n//g'

Output:

... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....,
... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....
 ... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....
Community
  • 1
  • 1
Thor
  • 45,082
  • 11
  • 119
  • 130
  • this will only work on single line put. because after the 1st sed, you cannot distinguish the `\n` were from you or in original input. – Kent Apr 15 '13 at 11:54
  • this will support multi-lines input. `sed -r 's/(^| )/\n\x98/g' file|sed '/^\x98[Ss]h/ s/././g'|sed -n '1h;1!H;${x;s/\n\x98/ /g;p}'` still the 3 seds approach – Kent Apr 15 '13 at 12:06
  • that was the idea I got in my lunch break... but after lunch I saw your answer, so I better not create a new answer. you can reference in your answer if you think it helps. – Kent Apr 15 '13 at 12:09
  • I see what you mean, for the line-beginning there was no space, but we added one. Later we must identify this very position. it could be solved by adding another char ,e.g. `\x99`, and yes, there would be two more `s/../../` statements. But it is the idea to try to make it support multilines. – Kent Apr 15 '13 at 12:44
  • OK.. sorry If my comments made you unhappy, I don't mean it. I just saw you got a similar idea as mine except for the multi-lines parts. well, maybe I should not post those comments at all.... +1 your answer: 3 sed lines but easy to read. – Kent Apr 15 '13 at 13:03
  • @Thor,@Kent: my fault, w.r.t. multi-line confusion. Even though I didn't explicitly mention about the multi-line thing, that was actually my goal - running this on files in a directory. Thanks to both of you. Cheers!! – MacUsers Apr 16 '13 at 08:10
  • Excellent explanation! Took much from it! Thank you!! – mark_infinite Jan 28 '21 at 23:41
3

This might work for you (GNU sed):

sed -r ':a;/\b[Ss]h\S+/!b;s//\n&\n/;h;s/.*\n(.*)\n.*/\1/;s/././g;G;s/(.*)\n(.*)\n.*\n/\2\1/;ta' file

In essence; it copies a word beginning with sh or Sh, replaces each character with . and then re-inserts the new string back into the original. When all occurences of the search string have been exhausted it prints out the line.

An alternative:

sed -E 's/\S+/\n&/g;s#.*#echo "&"|sed "/^sh/Is/\\S/./g"#e;s/\n//g' file
potong
  • 55,640
  • 6
  • 51
  • 83
  • +1 a nice sed - one liner. .. I can learn things from this one. – Kent Apr 15 '13 at 13:05
  • 1
    @potong: A bit long and towards the "getting out of control" edge but that's probably the only way of doing it using sed. As I said `sed` in my OP, I'll accept this but probably I'll use 'awk' as Kent suggested. Cheers!! – MacUsers Apr 16 '13 at 08:14